Unable to run a python flink application on cluster - apache-flink

I am trying to run a Python Flink Application on the standalone Flink cluster. The application works fine on a single node cluster but it throws the following error on a multi-node cluster. java.lang.Exception: The user defined 'open()' method caused an exception: An error occurred while copying the file. Please help me resolve this problem. Thank you
The application I am trying to execute has the following code.
from flink.plan.Environment import get_environment
from flink.plan.Constants import INT, STRING, WriteMode
env = get_environment()
data = env.from_elements("Hello")
data.map(lambda x: list(x)).output()
env.execute()

You have to configure "python.dc.tmp.dir" in "flink-conf.yaml" to point to a distributed filesystem (like HDFS). This directory is used to distributed the python scripts.

Related

Setting up a Flink cluster with Podman for a beampipeline with flinkrunner

My goal is to create a streaming pipeline to read data from Apache Kafka, process the data, and write back to it.
Because of security reasons, I want to avoid Docker and use Podman.
I have set up a minimal cluster via a docker-compose.yml with a jobmanager, taskmanager and a Python SDK harness worker. The SDK harness worker seems to get stuck when i try to execute a pipeline.
Running the pipeline (reading a multi-line .txt file and writing it back in a file) it gets transferred to the jobmanager and taskmanager correctly, but then goes idle. When I look in the pythonsdk container, the logs show the following message repeatedly:
2022/12/04 16:13:02 Starting worker pool 1: python -m
apache_beam.runners.worker.worker_pool_main --service_port=50000
--container_executable=/opt/apache/beam/boot
Starting worker with command ['/opt/apache/beam/boot', '--id=1-1',
'--logging_endpoint=localhost:45087',
'--artifact_endpoint=localhost:35323',
'--provision_endpoint=localhost:36435',
'--control_endpoint=localhost:33237']
2022/12/04 16:16:31 Failed to obtain provisioning information: failed to
dial server at localhost:36435
caused by:
context deadline exceeded
Here is a link to a test pipeline that was created:
Example on github
Environment:
Debian 11;
Podman;
Python 3.2.9;
apache-beam==2.38.0; and
podman-compose
The setup of the cluster defined in:
docker-compose.yml
1x flink-jobmanager (flink version 1.14)
1x flink-taskmanager
1x Python Harness SDK
I chose to create a SDK container manually because I don't have Docker installed and Flink fails when it tries to create a container
over Docker.
I suspect that I have made a mistake in the network setup or there are some configurations missing for the harness worker, but I could not figure out the problem. Any thoughts?
Crossposted in user mailing list of beam.apache.org

Upgrading Apache Flink need to update pom.xml?

I've just upgraded my flink from version 1.9.1 to 1.11.2 (using docker)
I have already many flink jobs running in version 1.9.1
When I try to upgrade to 1.11.1 and re run my job, it shows error.
2020-11-12 06:49:17,731 WARN org.apache.zookeeper.ClientCnxn []
- SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/jaas-1135609831848314731.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
2020-11-12 06:49:17,739 INFO org.apache.zookeeper.ClientCnxn [] - Opening socket connection to server xxxxxx:2181
2020-11-12 06:49:17,741 ERROR org.apache.curator.ConnectionState [] - Authentication failed
And this is the error after deploying my flink job:
Caused by: java.lang.RuntimeException: API paths not defined
and also:
java.lang.NoSuchMethodError: org.apache.flink.api.common.state.OperatorStateStore.getSerializableListState(Ljava/lang/String;)Lorg/apache/flink/api/common/state/ListState;
Do I need to change every pom for my flink jobs?
Is there any work around without changing my source code?
Thanks
Yes, you do have to rebuild your Flink jobs whenever you update the Flink version being used to run them. The libraries you use should be from the same exact version used by the Job Manager and Task Managers.
If you are trying to automate deployments for a CI/CD pipeline, you could inject the version number into the pom.xml using an environment variable -- but doing things like that can make it hard to debug when things go wrong.

Collecting Metrics with Graphite Plugin leads to "A metric named [..] already exists" error

when i configure the flink-conf.yaml to collect metrics with the graphite plugin,
the most time only incomplete metrics are being sent. On the Taskmanager output multiple errors occur like:
2018-08-15 00:58:59,016 WARN org.apache.flink.runtime.metrics.MetricRegistryImpl - Error while registering metric.
java.lang.IllegalArgumentException: A metric named mycomputer.taskmanager.8ceab4c3dfbf9fc5fa2af0447f1373a1.State machine job.Source: Custom Source.0.numRecordsOut already exists
at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
at org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
at org.apache.flink.runtime.metrics.MetricRegistryImpl.register(MetricRegistryImpl.java:329)
at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:379)
at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:312)
at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:302)
at org.apache.flink.runtime.metrics.groups.OperatorIOMetricGroup.<init>(OperatorIOMetricGroup.java:41)
at org.apache.flink.runtime.metrics.groups.OperatorMetricGroup.<init>(OperatorMetricGroup.java:48)
at org.apache.flink.runtime.metrics.groups.TaskMetricGroup.addOperator(TaskMetricGroup.java:146)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.setup(AbstractStreamOperator.java:174)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(OperatorChain.java:143)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:267)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
I've tried this on a completely freshly prepared flink-1.6.0 release with following config and the precompiled "State machine job" in the examples folder:
metrics.reporters: grph
metrics.reporter.grph.class: org.apache.flink.metrics.graphite.GraphiteReporter
metrics.reporter.grph.host: localhost
metrics.reporter.grph.port: 2003
metrics.reporter.grph.interval: 1 SECONDS
metrics.reporter.grph.protocol: TCP
I use the official graphite docker image (https://hub.docker.com/r/graphiteapp/docker-graphite-statsd/) that is running on the default configuration.
Has anybody an idea, how i can fix this issue?
Thank's and best regards
update
to exclude that a specific local setting is responsible for this behaviour, I repeated the process on a clean EC2 instance. There's exactly the same error here.
How to reproduce:
start EC2 t2.xlarge
installed java
download flink at https://www.apache.org/dyn/closer.lua/flink/flink-1.6.0/flink-1.6.0-bin-scala_2.11.tgz
added the flink-metrics-graphite-1.6.0.jar to lib
configured the flink-yaml.conf as mentioned in my previous post
./bin/start-cluster.sh
./bin/flink run examples/streaming/StateMachineExample.jar
I have not set up graphite in this case, because the error obviously already
occurs before.
After the job has been started you can view the error in the flink dashboard under Task Manager -> Logs

Apache Zeppelin running on spark occurs java ConnectionException

I want to ask some question about using appache-zeppelin installation.
I downloaded the zeppelin-0.5.5-incubating-bin-all
configure export JAVA_HOME=/sparkDemo/java-1.8.0-openjdk in zeppelin-env.sh and zeppelin.server.port 8084 in zeppelin-site.xml. I didn't configure SPARK_HOME in zeppelin-env.sh because i wanna use Zeppelin embedded Spark libraries.
But when i run the zeppelin tutorial code in my window browser,occur the following error: enter image description here
And even i configure SPARK_HOME, export MASTER in zeppelin-env.sh and create new interpreter in zeppelin web UI,the same error occurs.
Thanks a lot for responding me!
Stack Trace here
As mentioined in other answers, most probably the issue is that Interpreter process quite due to some error.
More details on particular error could be found in:
Interpreter process log
./logs/zeppelin-interpreter-<interpreter name>-<username>-<hostname>.log
and ZeppelinServer process log under
./logs/zeppelin-<username>-<hostname>.log

How to run managed-vm-gae example code locally

I followed this tutorial
to get a Bigtable client up and running in Google Managed VMs. But is there a way to run this locally? Reason is that deploying the code remotely in development is a pain.
Normally I can use dev_appserver.sh to run GAE app locally. But when I run it, I'm getting this error:
Caused by: java.lang.IllegalStateException: Jetty ALPN has not been
properly configured.
Which means we need to include ALPN library? Since our codebase is in Java 7, I used this ALPN version: 7.1.3.v20150130.
I then tried again with this:
dev_appserver.sh --jvm_flag=-Xbootclasspath/p:/Users/shouguoli/tmp/alpn-boot-7.1.3.v20150130.jar
still getting this error:
Caused by: com.google.apphosting.api.ApiProxy$CallNotFoundException:
The API package 'urlfetch' or call 'Fetch()' was not found.
How do you get it to work locally?
The sample was updated last week. It's based on the java 8 compat runtime, which means that you have access to most of the App Engine API's including Users, Task Queues, and Datastore.
There is a new Netty TCNative module that uses Boring SSL.
To use it with the pom.xml in the sample, do:
mvn clean -Pmac jetty:run -Dbigtable.projectID=<your-project> -Dbigtable.clusterID=<your-cluster> -Dbigtable.zone=<your-zone>
To use on Windows, use -Pwindows instead of -Pmac. For linux, omit the Profile -P as it's the default.
To deploy:
mvn clean gcloud:deploy -Dbigtable.projectID=<your-project> -Dbigtable.clusterID=<your-cluster> -Dbigtable.zone=<your-zone>
NOTE - it is advisable to do the clean between running locally and running remotely as the TCNative module is currently specific to the platform the code runs on.
We are in the process of updating all of our samples to use TCNative, we hope to have this by 3/10/16.

Resources