Upgrading Apache Flink need to update pom.xml? - apache-flink

I've just upgraded my flink from version 1.9.1 to 1.11.2 (using docker)
I have already many flink jobs running in version 1.9.1
When I try to upgrade to 1.11.1 and re run my job, it shows error.
2020-11-12 06:49:17,731 WARN org.apache.zookeeper.ClientCnxn []
- SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/jaas-1135609831848314731.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
2020-11-12 06:49:17,739 INFO org.apache.zookeeper.ClientCnxn [] - Opening socket connection to server xxxxxx:2181
2020-11-12 06:49:17,741 ERROR org.apache.curator.ConnectionState [] - Authentication failed
And this is the error after deploying my flink job:
Caused by: java.lang.RuntimeException: API paths not defined
and also:
java.lang.NoSuchMethodError: org.apache.flink.api.common.state.OperatorStateStore.getSerializableListState(Ljava/lang/String;)Lorg/apache/flink/api/common/state/ListState;
Do I need to change every pom for my flink jobs?
Is there any work around without changing my source code?
Thanks

Yes, you do have to rebuild your Flink jobs whenever you update the Flink version being used to run them. The libraries you use should be from the same exact version used by the Job Manager and Task Managers.
If you are trying to automate deployments for a CI/CD pipeline, you could inject the version number into the pom.xml using an environment variable -- but doing things like that can make it hard to debug when things go wrong.

Related

Task Manager not able to connect to Job Manager

I'm trying to upgrade our Flink cluster from 1.4.2 to 1.7.2
When I bring up the cluster, the task managers refuse to connect to the job managers with the following error.
2019-03-14 10:34:41,551 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink#cluster:22671] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink#cluster:22671]] Caused by: [cluster: Name or service not known]
Now, this works correctly if I add the following line into the /etc/hosts file.
x.x.x.x job-manager-address.com cluster
Why is Flink 1.7.2 connecting to JM using cluster in the address? Flink 1.4.2 used to have the job manager's address instead of the word cluster.
The jobmanager.sh script was being invoked with a second argument called cluster.
${Flink_HOME}/bin/jobmanager.sh start cluster
Prior to 1.5, the script expected an execution mode (local or cluster) but this is no longer the case. Invoking the script without the second argument solved this issue.
${Flink_HOME}/bin/jobmanager.sh start
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-1-7-2-Task-Manager-not-able-to-connect-to-Job-Manager-td26707.html
https://github.com/apache/flink/commit/d61664ca64bcb82c4e8ddf03a2ed38fe8edafa98
https://github.com/apache/flink/blob/c6878aca6c5aeee46581b4d6744b31049db9de95/flink-dist/src/main/flink-bin/bin/jobmanager.sh#L21-L25

Apache Flink Kubernetes Job Arguments

I'm trying to setup a cluster (Apache Flink 1.6.1) with Kubernetes and get following error when I run a job on it:
2018-10-09 14:29:43.212 [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2018-10-09 14:29:43.214 [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT]
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.flink.runtime.entrypoint.ClusterConfiguration.<init>(Ljava/lang/String;Ljava/util/Properties;[Ljava/lang/String;)V
at org.apache.flink.runtime.entrypoint.EntrypointClusterConfiguration.<init>(EntrypointClusterConfiguration.java:37)
at org.apache.flink.container.entrypoint.StandaloneJobClusterConfiguration.<init>(StandaloneJobClusterConfiguration.java:41)
at org.apache.flink.container.entrypoint.StandaloneJobClusterConfigurationParserFactory.createResult(StandaloneJobClusterConfigurationParserFactory.java:78)
at org.apache.flink.container.entrypoint.StandaloneJobClusterConfigurationParserFactory.createResult(StandaloneJobClusterConfigurationParserFactory.java:42)
at org.apache.flink.runtime.entrypoint.parser.CommandLineParser.parse(CommandLineParser.java:55)
at org.apache.flink.container.entrypoint.StandaloneJobClusterEntryPoint.main(StandaloneJobClusterEntryPoint.java:153)
My job takes a configuration file (file.properties) as a parameter. This works fine in standalone mode but apparently the Kubernetes cluster cannot parse it
job-cluster-job.yaml:
args: ["job-cluster", "--job-classname", "com.test.Abcd", "-Djobmanager.rpc.address=flink-job-cluster",
"-Dparallelism.default=1", "-Dblob.server.port=6124", "-Dquery.server.ports=6125", "file.properties"]
How to fix this?
Update: The job was built for Apache 1.4.2 and this might be the issue, looking into it.
The job was built for 1.4.2, the class with the error (EntrypointClusterConfiguration.java) was added in 1.6.1 (https://github.com/apache/flink/commit/ab9bd87e521d19db7c7d783268a3532d2e876a5d#diff-d1169e00afa40576ea8e4f3c472cf858) it seems, so this caused the issue.
We updated the job's dependencies to point to new 1.6.1 release and the arguments are parsed correctly.

[AWS Glue]: org.apache.thrift.TApplicationException: Internal error processing createInterpreter

I'm trying to use zeppelin-0.8.0 to connect to AWS Glue Development endpoint and when executing a cell below error occurs.
And there is no helpful message to understand what could be the problem. Any leads appreciated
172318_1906434757 is finished, status: ERROR, exception: java.lang.RuntimeException: org.apache.thrift.TApplicationException: Internal error processing createInterpreter, result: %text org.apache.thrift.TApplicationException: Internal error processing createInterpreter
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_createInterpreter(RemoteInterpreterService.java:209)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.createInterpreter(RemoteInterpreterService.java:192)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:169)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:165)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:165)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:132)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:299)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:407)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
UPDATE: So as in the answer below looks like 0.8.0 doesn't work with Glue yet.. I had problems running 0.7.x aw well with the javax.ws.rx package having a bunch of MethodNotFoundException when running with Java 8(did not help update-alternative to Java 7 as well). But when running inside a JDK 7 docker container it worked with no problems and was able to connect to my Dev end point. Highly appreciate if anyone can clarify the root cause of it
Could you please provide more information, such as zeppin instance location. Is it running on your desktop/laptop or is it running as AWS Notebook server? Also did you try connecting to zeppelin 0.7.3 version, as mentioned here in this AWS forum link :
https://forums.aws.amazon.com/thread.jspa?threadID=285128
As per the above link dated Jul 2018, think AWS Glue doesn't yet support Zeppelin 0.8 version.
I am assuming all other configurations, environment settings are done as needed. Can help more, if you can provide additional info.
UPDATE:
Anyway, please refer here and setting up zeppelin on windows, for any help on setting up local development environment & zeppelin notebook.
Once you set up the zeppelin notebook, have an SSH connection established (using AWS Glue DevEndpoint URL), so you can have access to the data catalog/crawlers,etc., and also the S3 bucket where your data resides. Then, you can create your python scripts in the zeppelin notebook, and run from the zeppelin.
You can use dev instance provided by Glue, but you may incur additional costs for the same(EC2 instance charges).
Environment settings (updated in response to comments):
JAVA_HOME=E:\Java7\jre7
Path=E:\Python27;E:\Python27\Lib;E:\Python27\Scripts;
PYTHONPATH=E:\spark-2.1.0-bin-hadoop2.7\python;E:\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip;E:\spark-2.1.0-bin-hadoop2.7\python\lib\pys
park.zip
SPARK_HOME=E:\spark-2.1.0-bin-hadoop2.7
Change the drive name/ folders accordingly. Let me know if any help neeed.

Collecting Metrics with Graphite Plugin leads to "A metric named [..] already exists" error

when i configure the flink-conf.yaml to collect metrics with the graphite plugin,
the most time only incomplete metrics are being sent. On the Taskmanager output multiple errors occur like:
2018-08-15 00:58:59,016 WARN org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while registering metric.
java.lang.IllegalArgumentException: A metric named mycomputer.taskmanager.8ceab4c3dfbf9fc5fa2af0447f1373a1.State machine job.Source: Custom Source.0.numRecordsOut already exists
at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
at org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
at org.apache.flink.runtime.metrics.MetricRegistryImpl.register(MetricRegistryImpl.java:329)
at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:379)
at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:312)
at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:302)
at org.apache.flink.runtime.metrics.groups.OperatorIOMetricGroup.<init>(OperatorIOMetricGroup.java:41)
at org.apache.flink.runtime.metrics.groups.OperatorMetricGroup.<init>(OperatorMetricGroup.java:48)
at org.apache.flink.runtime.metrics.groups.TaskMetricGroup.addOperator(TaskMetricGroup.java:146)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.setup(AbstractStreamOperator.java:174)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(OperatorChain.java:143)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:267)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
I've tried this on a completely freshly prepared flink-1.6.0 release with following config and the precompiled "State machine job" in the examples folder:
metrics.reporters: grph
metrics.reporter.grph.class: org.apache.flink.metrics.graphite.GraphiteReporter
metrics.reporter.grph.host: localhost
metrics.reporter.grph.port: 2003
metrics.reporter.grph.interval: 1 SECONDS
metrics.reporter.grph.protocol: TCP
I use the official graphite docker image (https://hub.docker.com/r/graphiteapp/docker-graphite-statsd/) that is running on the default configuration.
Has anybody an idea, how i can fix this issue?
Thank's and best regards
update
to exclude that a specific local setting is responsible for this behaviour, I repeated the process on a clean EC2 instance. There's exactly the same error here.
How to reproduce:
start EC2 t2.xlarge
installed java
download flink at https://www.apache.org/dyn/closer.lua/flink/flink-1.6.0/flink-1.6.0-bin-scala_2.11.tgz
added the flink-metrics-graphite-1.6.0.jar to lib
configured the flink-yaml.conf as mentioned in my previous post
./bin/start-cluster.sh
./bin/flink run examples/streaming/StateMachineExample.jar
I have not set up graphite in this case, because the error obviously already
occurs before.
After the job has been started you can view the error in the flink dashboard under Task Manager -> Logs

WLP :: Worklight :: Can't install runtime

I'm using Worklight 6.2 server edition and I can't deploy a working runtime (of other environments) on my server.
I'm using webpshere liberty profile v8.5.5 and when I deploy the runtime via GUI it says success and on server.xml I can see the new configuration for the app.
However when I go to the worklightconsole I don't see my runtime to upload the app.
On messages.log there is a error regarding JMX connection.
The quoted error is
Failed to obtain JMX connection to access an MBean. There might be a JMX configuration error: No JMX connector is configured
I'm refering this because I've seen some post on SO saying that these issues might be connected. However I have the restConnector-1.0 on my WLP features.
Reference: No runtime on my Worklight 6.2 Console after installing analytics
On messages.log there is some other things that I found interesting, like the correct start of the runtime I've deployed
[11/12/14 5:50:45:177 CST] 00000012 com.worklight.server.bundle.project.JeeProjectActivator I FWLST0002I: ========= Project /HelloWorld started. The project WAR file version is 6.2.0.00.20140922-2259,running on server version 6.2.0.00.20140613-0730. [project HelloWorld]
and two erros while starting my server
[11/12/14 5:50:49:911 CST] 00000012 SystemErr R 24 WorklightPU WARN [Scheduled Executor-thread-1] openjpa.Runtime - An error occurred while registering a ClassTransformer with PersistenceUnitInfo: name 'WorklightPU', root URL [file:/opt/IBM/WebSphere/Liberty/usr/shared/resources/worklight/lib/worklight-jee-library.jar]. The error has been consumed. To see it, set your openjpa.Runtime log level to TRACE. Load-time class transformation will not be available.
Second error:
java.lang.RuntimeException: Timeout while waiting for the management service to start up
I don't know what these are but I think it might be related to my problem and this errors eventually appear when I start my server.
Does anyone have any tips for troubleshooting this issue?
Thanks in advance.
This is a known issue from Websphere.
There is a APAR to fix that, a workaround is to restart the server with the --clean option to force a refresh onto the shared libraries.
http://www-01.ibm.com/support/docview.wss?uid=swg1PI17830

Resources