PyFlink - Kafka - Missing module - apache-flink

I am trying to start with PyFlink and Kafka, but get below error.
Thanks for your support !
Installation
python -m pip install apache-flink
pip install pyFlink
Code
from pyFlink.datastream import StreamExecutionEnvironment
Error
ModuleNotFoundError: No module named 'pyFlink'

To install PyFlink, you only need to execute:
python -m pip install apache-flink
and make sure you have a compatible Python version (>= 3.5).
Imports are case-sensitive; the error is thrown because the package name is "pyflink", not "pyFlink". So, instead, you can try:
from pyflink.datastream import StreamExecutionEnvironment
If you're going to use Kafka, please remember to also add the required (JAR) dependencies, using:
config = t_env.get_config().get_configuration()
config.set_string("pipeline.jars",
"file:///path/to/jar/jarfile.jar")
You can read more about handling connectors and other dependencies in the PyFlink documentation.

Related

pyflink, ImportError: No module named pyflink

I am testing pyflink on
os: centos7
flink version: flink-1.14.3
virtualenv python version: Python 3.6.8
pip list:
apache-beam 2.27.0
apache-flink 1.14.3
apache-flink-libraries 1.14.3
avro-python3 1.9.2.1
certifi 2021.10.8
charset-normalizer 2.0.11
cloudpickle 1.2.2
crcmod 1.7
dill 0.3.1.1
docopt 0.6.2
fastavro 0.23.6
future 0.18.2
grpcio 1.43.0
hdfs 2.6.0
httplib2 0.17.4
idna 3.3
mock 2.0.0
numpy 1.19.5
oauth2client 4.1.3
pandas 1.1.5
pbr 5.8.1
pip 21.3.1
protobuf 3.17.3
py4j 0.10.8.1
pyarrow 2.0.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pydot 1.4.2
pyflink 1.0
pymongo 3.12.3
pyparsing 3.0.7
python-dateutil 2.8.0
pytz 2021.3
requests 2.27.1
rsa 4.8
setuptools 59.6.0
six 1.16.0
typing-extensions 3.7.4.3
urllib3 1.26.8
wheel 0.37.1
I tried to run this command :
(virtualenv) [myuser#myvm flink-1.14.3] ./bin/flink run -py examples/python/table/word_count.py
And got the following error:
Caused by: java.io.IOException: Failed to execute the command: python -c import pyflink;import os;print(os.path.join(os.path.abspath(os.path.dirname(pyflink.file)), 'bin'))
output: Traceback (most recent call last):
File "", line 1, in
ImportError: No module named pyflink
I am sure pyflink package is already installed. Does anyone know why?
To install PyFlink, you only need to execute:
python -m pip install apache-flink
and make sure you have a compatible Python version (>= 3.5).
The problem may be the Python Virtual Environment, refer to https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/python/faq/#preparing-python-virtual-environment
Also may be you can add option '-pyexec venv.zip/venv/bin/python3' and have a try
You have to check if pyflink is well installed (in your venv)
also check if you are running Flink
if no, start it with :
start-cluster.sh
here is full documentation about PyFlink:
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/python/overview/

Install interpreter for Zeppelin

I need to custom install interpreter for zeppelin apache. Not all of interpreter, i only need md, shell, python (default), jdbc, spark (default). I do some ways, but it failed:
Install online via command
./bin/install-interpreter.sh --name md,shell,jdbc
But i received error like this:
Install jdbc(org.apache.zeppelin:zeppelin-jdbc:0.8.0) to /opt/zeppelin-0.8.2-bin-netinst/interpreter/jdbc ...
org.sonatype.aether.RepositoryException: Cannot fetch dependencies for org.apache.zeppelin:zeppelin-jdbc:0.8.0
at org.apache.zeppelin.dep.DependencyResolver.getArtifactsWithDep(DependencyResolver.java:179)
at org.apache.zeppelin.dep.DependencyResolver.loadFromMvn(DependencyResolver.java:128)
at org.apache.zeppelin.dep.DependencyResolver.load(DependencyResolver.java:76)
at org.apache.zeppelin.dep.DependencyResolver.load(DependencyResolver.java:93)
at org.apache.zeppelin.dep.DependencyResolver.load(DependencyResolver.java:85)
at org.apache.zeppelin.interpreter.install.InstallInterpreter.install(InstallInterpreter.java:170)
at org.apache.zeppelin.interpreter.install.InstallInterpreter.install(InstallInterpreter.java:134)
at org.apache.zeppelin.interpreter.install.InstallInterpreter.install(InstallInterpreter.java:126)
at org.apache.zeppelin.interpreter.install.InstallInterpreter.main(InstallInterpreter.java:278)
Caused by: java.lang.NullPointerException
at org.sonatype.aether.impl.internal.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:352)
at org.apache.zeppelin.dep.DependencyResolver.getArtifactsWithDep(DependencyResolver.java:176)
... 8 more
I configed like this to fix it:
In zeppelin-site.xml
<property>
<name>zeppelin.interpreter.dep.mvnRepo</name>
<value>https://repo1.maven.org/maven2/</value>
<description>Remote principal repository for interpreter's additional dependency loading</description>
</property>
and in zeppelin-env.sh
export ZEPPELIN_INTERPRETER_DEP_MVNREPO="https://repo1.maven.org/maven2/"
enter image description here
i changed http to https, but it have no efffect.
Install offline
I download jar file from mvnrepository and run
bin/install-interpreter.sh --name md --artifact /tmp/zeppelin-jar/zeppelin-markdown-0.8.2.jar &&
bin/install-interpreter.sh --name shell --artifact /tmp/zeppelin-jar/zeppelin-shell-0.8.2.jar &&
bin/install-interpreter.sh --name jdbc --artifact /tmp/zeppelin-jar/zeppelin-jdbc-0.8.2.jar
But packages relate to many other dependencies jar need to download. Example:
zeppelin-shell-0.8.2 need some dependencies
org.apache.commons » commons-exec
org.apache.commons » commons-lang3
org.apache.zeppelin » zeppelin-interpreter
org.slf4j » slf4j-api
org.slf4j » slf4j-log4j12
How to install interpreter? I expected can install via command online. But seem error because network. I installed from my PC on company
Thank you every one so much
I downloaded full version of zeppelin 0.9.0. Problem resolved

pipenv install glob fails

I tried to install glob in my virtual python (version 3.5) environment. This is an error I got. I found similar questions on this channel, but not much of help.
$pipenv install glob
Installing glob…
Collecting glob
Error: An error occurred while installing glob!
Could not find a version that satisfies the requirement glob (from versions: )
No matching distribution found for glob
The issue is that pipenv looks up the version in the url specified in [[source]] in the Pipfile and glob is not in there. However, glob is part of the Standard Library in Python so you do not need to install it via pipenv and you can just call it from your script 'import glob' and it should work.
You are using python 3.X
here are the correct glob versions
for python 2.7
sudo pip install glob2
for python 3.7
sudo pip3 install glob3

Installing zeppelin on CentOS6 with lens get compilation error

I git clone the zeppelin from https://github.com/apache/incubator-zeppelin.git, and make the project by running:
mvn clean package -Pspark-1.5 -Dhadoop.version=2.2.0 -Phadoop-2.2 -Ppyspark -DskipTests
but i always get the error:
Most probably, this indicates a dependency version miss-match, in this case Apache Lense.
The best way is to try re-building Apache Zeppelin from latest master, and if the issue persists - file an issue on official project JIRA

rethinkdb index-rebuild complains python driver is missing

Ran into this error when trying to run rethinkdb rebuild command:
Error when launching 'rethinkdb-index-rebuild': No such file or
directory The rethinkdb-index-rebuild command depends on the RethinkDB
Python driver, which must be installed. If the Python driver is
already installed, make sure that the PATH environment variable
includes the location of the backup scripts, and that the current user
has permission to access and run the scripts.
Yet I have the rethinkdb python module installed and path setup properly:
Requirement already satisfied (use --upgrade to upgrade): rethinkdb in
/Library/Python/2.7/site-packages Cleaning up...
Why doesn't this work?
If the rethinkdb-index-rebuild script is not in your PATH, you might be able to invoke the index-rebuild command as
python -mrethinkdb._index_rebuild
Turns out it was a feature implemented in a newer version of the python module. Solved it by:
sudo pip install --upgrade rethinkdb

Resources