flink cluster startup error [ERROR] Could not get JVM parameters properly - apache-flink

$ bin/start-cluster.sh
Starting cluster.
[INFO] 1 instance(s) of standalonesession are already running on centos1.
Starting standalonesession daemon on host centos1.
[ERROR] Could not get JVM parameters properly.
[ERROR] Could not get JVM parameters properly.
I have got the $JAVA_HOME in all the master and slaves
]$ echo $JAVA_HOME
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64/
Below are the config file settings.
jobmanager.rpc.address: 10.0.2.4
# The RPC port where the JobManager is reachable.
jobmanager.rpc.port: 6123
taskmanager.numberOfTaskSlots: 5
parallelism.default: 2
JPS in master:
]# jps
30944 QuorumPeerMain
9600 StandaloneSessionClusterEntrypoint
31640 ConsoleProducer
32889 Jps
31278 Kafka
in slave I am not able to see jps command output:
# jps
-bash: jps: command not found
Also under the task-manager i don't see any entries.
http://10.x.x.x:8081/#/task-manager

Did you configure the conf/slaves file? See https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/cluster_setup.html#configuring-flink.
The problem with jps doesn't look like a problem with Flink. Is there really a JDK on the slave?

Related

Solr / Zookeeper : "An exception was thrown while closing send thread"

I am trying Solr for the first time on RHEL 8 with Openjdk version "17.0.2".
I am following the tutorial https://solr.apache.org/guide/8_11/solr-tutorial.html. I get the warning:
WARN - 2022-04-20 12:07:20.762; org.apache.zookeeper.ClientCnxn; An exception was thrown while closing send thread for session 0x10003e1057e0003. => EndOfStreamException: Unable to read additional data from server sessionid 0x10003e1057e0003, likely server has closed socket
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read additional data from server sessionid 0x10003e1057e0003, likely server has closed socket
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) ~[zookeeper-3.6.2.jar:3.6.2]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) ~[zookeeper-3.6.2.jar:3.6.2]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1275) ~[zookeeper-3.6.2.jar:3.6.2]
This should be a straight forward tutorial. Do you know what I am missing?
Here is tutorial from the start:
[solr#abc294837 ~]$ ./bin/solr start -e cloud
Welcome to the SolrCloud example!
This interactive session will help you launch a SolrCloud cluster on your local workstation.
To begin, how many Solr nodes would you like to run in your local cluster? (specify 1-4 nodes) [2]:
Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.
Please enter the port for node1 [8983]:
Please enter the port for node2 [7574]:
Solr home directory /opt/solr/example/cloud/node1/solr already exists.
/opt/solr/example/cloud/node2 already exists.
Starting up Solr on port 8983 using command:
"/opt/solr/bin/solr" start -cloud -p 8983 -s "/opt/solr/example/cloud/node1/solr"
Waiting up to 180 seconds to see Solr running on port 8983 [\]
Started Solr server on port 8983 (pid=50226). Happy searching!
Starting up Solr on port 7574 using command:
"/opt/solr/bin/solr" start -cloud -p 7574 -s "/opt/solr/example/cloud/node2/solr" -z localhost:2181
Waiting up to 180 seconds to see Solr running on port 7574 [-]
Started Solr server on port 7574 (pid=50417). Happy searching!
INFO - 2022-04-20 12:07:20.502; org.apache.solr.common.cloud.ConnectionManager; Waiting for client to connect to ZooKeeper
INFO - 2022-04-20 12:07:20.553; org.apache.solr.common.cloud.ConnectionManager; zkClient has connected
INFO - 2022-04-20 12:07:20.556; org.apache.solr.common.cloud.ConnectionManager; Client is connected to ZooKeeper
INFO - 2022-04-20 12:07:20.631; org.apache.solr.common.cloud.ZkStateReader; Updated live nodes from ZooKeeper... (0) -> (2)
INFO - 2022-04-20 12:07:20.737; org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at localhost:2181 ready
WARN - 2022-04-20 12:07:20.762; org.apache.zookeeper.ClientCnxn; An exception was thrown while closing send thread for session 0x10003e1057e0003. => EndOfStreamException: Unable to read additional data from server sessionid 0x10003e1057e0003, likely server has closed socket
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read additional data from server sessionid 0x10003e1057e0003, likely server has closed socket
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) ~[zookeeper-3.6.2.jar:3.6.2]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) ~[zookeeper-3.6.2.jar:3.6.2]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1275) ~[zookeeper-3.6.2.jar:3.6.2]
Now let's create a new collection for indexing documents in your 2-node cluster.
Please provide a name for your new collection: [gettingstarted]
ยดยดยด
You are not missing anything, this is Zookeeper falsely warning about a socket connection being closed.
[EDIT] : This has been fixed in Solr versions 8.11.2, and 9.0.0 (Zookeeper versions 3.6.4, 3.7.1, 3.8.1, 3.9.0).
We can see in this commit that the exception is caught and expected (comment says closing so this is expected), yet it is now reported as a warning and a stack trace is logged, although this is not an error per se. So you can consider this message a debug message (as it was before that commit).
See for reference this issue, caused by this issue, and this pull request for the fix.
We can still make Zookeeper quiet from Solr/log4j config, by changing the level of its logger from "warn" to "error" :
solr/solr/server/resources/log4j2-console.xml
<AsyncLogger name="org.apache.zookeeper" level="ERROR"/>

My systemd Solr service will not enable. No error message

Earlier today I was able to get a systemd definition working for solr. Then I made an edit to the definition and tried to reload it, and now the service is somehow no longer enabled, and I can't enable it. If I do...
systemctl enable solr.service
...there is no output. No error message. But then if I do...
systemctl -l | grep solr
...there is nothing there. It seems to be falling back on System V when I run "service solr start". Solr starts, but it isn't using the systemd definition.
If I run "systemctl status solr.service", I see...
# systemctl status solr.service
? solr.service - Apache SOLR
Loaded: loaded (/etc/systemd/system/solr.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Thu 2020-10-15 18:34:09 EDT; 15min ago
Main PID: 44533 (code=exited, status=0/SUCCESS)
Evidently the service was enabled, but I needed a slightly different sequence of commands to start it. I also seem to need the following in my systemd definition:
Type=forking
I don't think I needed that earlier, but it seems to be necessary now. After each change to my solr.service file, I need to run "systemctl daemon-reload" and then restart solr.
I also needed to ensure solr was not already running (with System V), and then start it with systemctl, like so...
service solr stop
systemctl start solr
Then I could run "systemctl status solr.service" and get a better result...
# systemctl status solr.service
? solr.service - Apache SOLR
Loaded: loaded (/etc/systemd/system/solr.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2020-10-15 19:27:34 EDT; 2min 49s ago
Process: 78134 ExecStart=/opt/solr/bin/solr start (code=exited, status=0/SUCCESS)
Main PID: 78181 (java)
Tasks: 55
Memory: 2.1G
CGroup: /system.slice/solr.service
+-78181 java -server -Xms31g -Xmx31g -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:+UnlockExperimentalVMOptions -XX:G1MaxNewSizePer...
Separately, I don't know why I was getting no output from...
systemctl -l | grep solr
The systemctl command by itself should be equivalent to "systemctl list-units" and the -l flag is an alias for --all. This should list known units, whether active or not.

Unable to launch Apache Flink 1.11.1 in Windows 10 system by using ./bin/start-cluster.sh

I have installed apache_flink-1.11.1-bin-scala_2.11 in my windows System and also added FLINK_HOME in my environment variable. I also installed cygwin for Linux Environment in my Windows System. When I'm trying to launch flink by running ./bin/start-cluster.sh it is unable to start the cluster. Error messages are coming as:
The execution result is empty
and
Could not get JVM Parameters and dynamic configurations properly.
Parameters under flink-conf.yaml are
jobmanager.rpc.address: 192.168.1.101
jobmanager.rpc.port: 6123
jobmanager.memory.process.size: 1600m
taskmanager.memory.process.size: 1728m
parallelism.default: 1
with an additional one working node. I'm not able to debug the issue, please help me to sort out this configurational setup.

Apache Flink Kubernetes Job Arguments

I'm trying to setup a cluster (Apache Flink 1.6.1) with Kubernetes and get following error when I run a job on it:
2018-10-09 14:29:43.212 [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2018-10-09 14:29:43.214 [main] INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT]
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.flink.runtime.entrypoint.ClusterConfiguration.<init>(Ljava/lang/String;Ljava/util/Properties;[Ljava/lang/String;)V
at org.apache.flink.runtime.entrypoint.EntrypointClusterConfiguration.<init>(EntrypointClusterConfiguration.java:37)
at org.apache.flink.container.entrypoint.StandaloneJobClusterConfiguration.<init>(StandaloneJobClusterConfiguration.java:41)
at org.apache.flink.container.entrypoint.StandaloneJobClusterConfigurationParserFactory.createResult(StandaloneJobClusterConfigurationParserFactory.java:78)
at org.apache.flink.container.entrypoint.StandaloneJobClusterConfigurationParserFactory.createResult(StandaloneJobClusterConfigurationParserFactory.java:42)
at org.apache.flink.runtime.entrypoint.parser.CommandLineParser.parse(CommandLineParser.java:55)
at org.apache.flink.container.entrypoint.StandaloneJobClusterEntryPoint.main(StandaloneJobClusterEntryPoint.java:153)
My job takes a configuration file (file.properties) as a parameter. This works fine in standalone mode but apparently the Kubernetes cluster cannot parse it
job-cluster-job.yaml:
args: ["job-cluster", "--job-classname", "com.test.Abcd", "-Djobmanager.rpc.address=flink-job-cluster",
"-Dparallelism.default=1", "-Dblob.server.port=6124", "-Dquery.server.ports=6125", "file.properties"]
How to fix this?
Update: The job was built for Apache 1.4.2 and this might be the issue, looking into it.
The job was built for 1.4.2, the class with the error (EntrypointClusterConfiguration.java) was added in 1.6.1 (https://github.com/apache/flink/commit/ab9bd87e521d19db7c7d783268a3532d2e876a5d#diff-d1169e00afa40576ea8e4f3c472cf858) it seems, so this caused the issue.
We updated the job's dependencies to point to new 1.6.1 release and the arguments are parsed correctly.

FLINK : Deployment took more than 60 seconds

I am new to flink and trying to deploy my jar on EMR cluster. I have used 3 node cluster (1 master and 2 slaves) with their default configuration. I have not done any configuration changes and sticking with default configuration. On running the following command on my master node:
flink run -m yarn-cluster -yn 2 -c Main /home/hadoop/myjar-0.1.jar
I am getting the following error:
INFO org.apache.flink.yarn.YarnClusterDescriptor- Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
Can anyone please explain what could be the possible reason for this error?
As you didn't determine any resources (Memory, CPU core), I guess it's because the YARN cluster has not the desired resource, especially memory.
Try submitting your jar file using the following type of commands:
flink run -m yarn-cluster -yn 5 -yjm 768 -ytm 1400 -ys 2 -yqu streamQ my_program.jar
You can find more information about the command here
You can check application logs in YARN WebUI to see what's the problem exactly.
Also, check this posts:
Post1
post2

Resources