FLINK : Deployment took more than 60 seconds - apache-flink

I am new to flink and trying to deploy my jar on EMR cluster. I have used 3 node cluster (1 master and 2 slaves) with their default configuration. I have not done any configuration changes and sticking with default configuration. On running the following command on my master node:
flink run -m yarn-cluster -yn 2 -c Main /home/hadoop/myjar-0.1.jar
I am getting the following error:
INFO org.apache.flink.yarn.YarnClusterDescriptor- Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
Can anyone please explain what could be the possible reason for this error?

As you didn't determine any resources (Memory, CPU core), I guess it's because the YARN cluster has not the desired resource, especially memory.
Try submitting your jar file using the following type of commands:
flink run -m yarn-cluster -yn 5 -yjm 768 -ytm 1400 -ys 2 -yqu streamQ my_program.jar
You can find more information about the command here
You can check application logs in YARN WebUI to see what's the problem exactly.
Also, check this posts:
Post1
post2

Related

vespa deploy --wait 300 app-1-getting-started get error-code "METHOD_NOT_ALLOWED"

When I follow this article to practice vepsa
https://docs.vespa.ai/en/tutorials/news-1-getting-started.html
when i do this step
vespa deploy --wait 300 app-1-getting-started
i got this error
{
"error-code": "METHOD_NOT_ALLOWED",
"message": "Method 'POST' is not supported"
}
why and how can i fix this error ?
I am unable to reproduce, I just ran through the steps. I suggest you submit an issue at https://github.com/vespa-engine/vespa/issues with your environment and also include vespa.log for the Vespa Team to have a look
Vespa deploys to http://localhost:19071/ and if the service running on that port is not the Vespa configuration service, but a different HTTP server that returns 405, this might explain the behavior you observe. The tutorial starts the Vespa container image using 3 port bindings
8080:8080 is the Vespa container (data plane, read and write)
19071:19071 is the Vespa configuration service which accepts app package (control plane)
docker run -m 10G --detach --name vespa --hostname vespa-tutorial \
--publish 8080:8080 --publish 19071:19071 --publish 19092:19092 \
vespaengine/vespa

How do you run pyflink scripts on AWS EMR?

I am struggling to run the basic word_count.py pyflink example that comes loaded with the apache flink on AWS EMR
Steps taken:
Successfully created AWS EMR 6.5.0 cluster with the following applications [Flink, Zookeeper] - verified that there is a flink and flink-yarn-session binary in $PATH. AWS says it installed v1.14.
Ran the java version successfully by doing the following
sudo flink-yarn-sessions
sudo flink run -m yarn-cluster -yid <application_id> /usr/lib/flink/examples/batch/WordCount.jar
Tried running the same with the python but no dice
sudo flink run -m yarn-cluster -yid <application_id> -py /usr/lib/flink/examples/python/table/word_count.py
This fails but error makes it obvious that its picking up python2.7 even though python3 is default!!
Fixed the issue by somewhat following this link. Then tried with a simple example to print out sys.version. This confirmed that its picking up my python version
Try again with venv
sudo flink run -m yarn-cluster -yid <application_id> -pyarch file:///home/hadoop/venv.zip -pyclientexec venv.zip/venv/bin/python3 -py /usr/lib/flink/examples/python/table/word_count.py
At this point, I start seeing various issues ranging from no file found to mysterious
pyflink.util.exceptions.TableException: org.apache.flink.table.api.TableException: Failed to execute sql
I ran various permutation of with/without yarn cluster. But no progress made thus far.
I am thinking my issues are either environment related (why isn't AWS taking care of proper python version is beyond me) or my inexperience with yarn/pyflink.
Any pointer would be greatly appreciated.
This is what you do. To make a cluster:
aws emr create-cluster --release-label emr-6.5.0 --applications Name=Flink --configurations file://./config.json --region us-west-2 --log-uri s3://SOMEBUCKET --instance-type m5.xlarge --instance-count 2 --service-role EMR_DefaultRole --ec2-attributes KeyName=YOURKEYNAME,InstanceProfile=EMR_EC2_DefaultRole --steps Type=CUSTOM_JAR,Jar=command-runner.jar,Name=Flink_Long_Running_Session,Args=flink-yarn-session,-d
Contents of config.json:
[
{
"Classification": "flink-conf",
"Properties": {
"python.executable": "python3",
"python.client.executable": "python3"
},
"Configurations": [
]
}
]
Then once you are in, try this
sudo flink run -m yarn-cluster -yid YID -py /usr/lib/flink/examples/python/table/batch/word_count.py
You can find the YID in the AWS EMR console under application user interfaces.

Where is my main method runs when using in yarn-cluster and detached mode

I am new to flink and reading Flink 1.8 source code(https://github.com/apache/flink/tree/release-1.8) to understand how flink works with YARN.
I know there are detached mode and non-detached mode for the per-job cluster mode.
For the non-detached mode, such as the following command:
flink run -m yarn-cluster -c my.HelloFlink -yn 2 -ys 1 ./my.jar
After the yarn cluster is deployed,then the client process starts to run my main method(my.HelloFlink#main), and the client process doesn't terminate until my main method finished.
For the detached mode, such as the following command:
flink run -d -m yarn-cluster -c my.HelloFlink -yn 2 -ys 1 ./my.jar
After the yarn cluster is deployed,then the client process terminates soon, but I didn't find where(in which process) my main method gets to run(my.HelloFlink#main), could some one help me out here and help on where my main method runs?
Thanks, I have struggled on this question for days, thanks very much!
When you use flink run ... you are running a bash script which ends with this line
exec $JAVA_RUN $JVM_ARGS $FLINK_ENV_JAVA_OPTS "${log_setting[#]}" -classpath "`manglePathList "$CC_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" org.apache.flink.client.cli.CliFrontend "$#"
which fires up a JVM running the CliFrontend. Your main method runs there. What your main method does is to construct a job graph and submit it to the yarn cluster, along with its dependencies. If you run in detached mode, this CliFrontend process simply exits after submitting the job, as it is no longer useful.
By the way, Flink 1.11 has added a new flink run-application deployment target that runs the main method in the Job Manager instead. This has significant advantages in some situations; see Application Deployment in Flink: Current State and the new Application Mode for details.

flink on yarn error "Yarn only has -1 virtual cores available"

I have installed Cloudera CDH 6, and then I want to install flink using package from flink website, and run it "flink on yarn".
I have down the following steps:
1. edit /etc/profile
export HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-6.0.0-1.cdh6.0.0.p0.537114/lib/hadoop/etc/hadoop
execute ./yarn-session.sh -n 4 -jm 2048 -tm 2048 -s 3 -nm FlinkOnYarnSession -d -st
But it always shows:
The number of virtual cores per node were configured with 4 but Yarn only has -1 virtual cores available. Please note that the number of virtual cores is set to the number of task slots by default unless configured in the Flink config with 'yarn.containers.vcores.'
This is a new cluster and there is no job running, and I see in yarn webUI there is 20 vcores available.
Please help with this problem. Thank you very much!
This is due to a bug in flink, and the details can be viewed here:
https://issues.apache.org/jira/browse/FLINK-5542
With flink 1.6.1 version, I solved this by modify yarn-site.yml and add cpu-vcores parameter.
vim $HADOOP_CONF_DIR/yarn-site.yml
add yarn.nodemanager.resource.cpu-vcores property, for example set it to 8.
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>8</value>
</property>

Flink 1.3 running a single job on YARN how to set the number of Task Slots per TaskManager

I am running a single flink job on Yarn as descriped here.
flink run -m yarn-cluster -yn 3 -ytm 12000
I can set the number of yarn nodes / task managers with the above parameter -yn. However I want to know whether it is possible to set the number of task slots per task manager. When I use the parallelsim (-p) parameter it only sets the overall parallelism. And the number of task slots is computed by dividing this value by the number of provided task managers. I tried using the dynamic properties (-yD) parameter which is supposed to "allow the user to specify additional configuration values" like this:
-yD -Dtaskmanager.numberOfTaskSlots=8
But this does not overwrite the value given in the flink-conf.yaml.
Is there any way to specify the number of task slots per TaskManager when running a single on flink (other than changing the config file)?
Also is there a documentation which dynamic properties are valid using the -yD parameter?
You can use the settings of yarn-session, here, prefixed by y to submit Flink job on YARN cluster. For example the command,
flink run -m yarn-cluster -yn 5 -yjm 768 -ytm 1400 -ys 2 -yqu streamQ my_program.jar
will submit my_program.jar Flink application with 5 containers, 768m memory for the jobmanager, 1400m memory and 2 cpu core for taskmanagers, each and will use the resources of nodemanagers on predefined YARN queue streamQ. See my answer to this post for other important information.

Resources