Add Flink Job Jar in Docker Setup and run Job via Flink Rest API - apache-flink

We're running Flink in Cluster Session mode and automatically add Jars in the Dockerfile:
ADD pipeline-fat.jar /opt/flink/usrlib/pipeline-fat.jar
So that we can run this Jar via the Flink Rest API without the need to upload the Jar in advance:
POST http://localhost:8081/:jarid/run
But the "static" Jar is now shown, to get the :jarid:
GET http://localhost:8081/jars
So my question is:
Is it possible to run a userlib jar using the Flink Rest API?
Or can you only reference such jars via
CLI flink run -d -c ${JOB_CLASS_NAME} /job.jar
and standalone-job --job-classname com.job.ClassName Mode?
My alternative approach (workaround) would be to upload the jar in the Docker entrypoint.sh of the jobmanager container:
curl -X POST http://localhost:8084/jars/upload \
-H "Expect:" \
-F "jarfile=#./pipeline-fat.jar"

I believe that it is unfortunately not possible to currently start a flink cluster in session mode with a jar pre-baked in the docker image, and then start the job using the REST API commands (as you showed).
However your workaround approach seems like a good idea to me. I would be curious to see if it worked for you in practice.

I managed to run a userlib jar using the command line interface.
I edited docker compose to run custom docker-entrypoint.sh.
I've add to original docker-entrypoint.sh
run_user_jars() {
echo "Starting user jars"
exec ./bin/flink run /opt/flink/usrlib/my-job-0.1.jar & }
run_user_jars
...
And edit original entrypoint for jobmanager in docker-compose.yml file
entrypoint: ["bash", "/opt/flink/usrlib/custom-docker-entrypoint.sh"]

Related

Flink EMR Deployment can't pick up Yarn context and executes only as a local application

I'm deploying my Flink app on AWS EMR 6.2
Flink version: 1.11.2
I configured the step with:
Arguments :"flink-yarn-session -d -n 2"
as described here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/flink-jobs.html
Both the cluster and the step are in state 'running'.
The controller logs shows:
INFO startExec 'hadoop jar
/mnt/var/lib/hadoop/steps/s-309XLKSZGN4V4/trigger-service-***.jar
flink-yarn-session -d -n 2'
However, the syslog show that the application didn't pick-up on the Yarn context.
2021-02-24 13:19:57,772 INFO org.apache.flink.runtime.minicluster.MiniCluster (main): Starting Flink Mini Cluster
I pulled out the class name returned by StreamExecutionEnvironment.getExecutionEnvironment in my app, and indeed it is LocalStreamEnvironment.
The application itself runs properly on the JobMaster instance as a local app.
Turns out that the -n is deprecated in new versions of flink, so, per the advice from AWS support, I used this command instead:
flink-yarn-session -d -s 2
Arguments meaning:
-d,--detached Start detached
-s,--slots Number of slots per TaskManager
In addition, In order to make Flink was required create two steps:
--steps '[{"Args":["flink-yarn-session","-d","-s","2"],"Type":"CUSTOM_JAR","MainClass":"","ActionOnFailure":"CANCEL_AND_WAIT","Jar":"command-runner.jar","Properties":"","Name":"Yarn Long Running Session"},{"Args":["bash","-c","aws s3 cp s3://<URL of Application Jar> .;/usr/bin/flink run -m yarn-cluster -p 2 -d <Application Jar>"],"Type":"CUSTOM_JAR","MainClass":"","ActionOnFailure":"CANCEL_AND_WAIT","Jar":"command-runner.jar","Properties":"","Name":"Flink Job"}]'
The first uses command-runner.jar to execute
flink-yarn-session -d -s 2
and the second uses command-runner.jarto download my application Jar from S3 and run the job with flink on yarn-cluster.

How to get the deployment.yaml file for MSSQL cluster in Kubernetes

I am installing the SQL Server cluster in Kubernetes setup using the below document.
Check here.
I am able to install the cluster successfully. But I need to customize the deployment by specifying the custom docker image, add additional containers.
May I know how to get the deployment YAML file & Dockerfile for all the images in the running containers?
I have tried "kubectl edit", but not able to edit required details.
the easiest way of doing that is using something like:
kubectl get deployment -n %yournamespace% -o yaml > export.yaml
For Kubernetes YAML files you can use:
kubectl get <pod,svc,all,etc> <name> --export=true -o yaml
or
kubectl get <pod,svc,all,etc> <name> -o yaml
For Docker Dockerfiles there is whole post on stackoverflow which explain how to create a dockerfile from an image.
Kubernetes Cheat Sheet is a good source for kubectl commands.
Let's say your deployments are created in develop namespace. Below command will help you to get a yaml out of your deployment, --export flag will remove the cluster specific information.
kubectl get deploy <your-deployment-name> --namespace=develop -o yaml --export > your-deployment.yaml
Below command will help you to get a json out of your deployment.
kubectl get deploy <your-deployment-name> --namespace=develop -o json --export > your-deployment.yaml

connect SQL to apache nifi

I'm new to nifi and i want to connect SQL server database to nifi and create a data flow with the processors. how can I do this, can any one Help me with this clearly.
Thanks in Advance
sam
Here are two great articles on getting information in and out of databases with NiFi:
http://www.batchiq.com/database-injest-with-nifi.html
http://www.batchiq.com/database-extract-with-nifi.html
They describe/illustrate how to configure a DBCPConnectionPool service to provide connection(s) to an RDBMS, and example flows to extract data and ingest data.
Expanding on mattyb answer
If you are using the latest Hortonworks sandbox, or other setup that uses docker containers, read below.
You have to install the JDBC jar file inside the docker. For SQL Server, it should be 6.2 or above.
docker ps
docker exec -it <mycontainer uuid> bash
How do I get into a Docker container's shell?
will help you log into the container.
cd file:///usr/lib/jvm/jre/lib/
mkdir jdbc
cd ./jdbc
wget https://download.microsoft.com/download/3/F/7/3F74A9B9-C5F0-43EA-A721-07DA590FD186/sqljdbc_6.2.2.0_enu.tar.gz
tar xvzf sqljdbc_6.2.2.0_enu.tar.gz
cp ./sqljdbc_6.2/enu/mssql-jdbc-6.2.2.jre8.jar ./
jdbc:sqlserver://192.168.1.201:1433;databaseName=[your database]
com.microsoft.sqlserver.jdbc.SQLServerDriver
You might need to replace the ip address with IPv4 address of your host found under ipconfig in Windows or ifconfig in Mac/Linux.
You may change file:///usr/lib/jvm/jre/lib/ to any path you desire.
Expanding on TamusJRoyce's answer
If you are running nifi via a docker image like apache/nifi or the aforementioned Hortonworks sandbox, the following should help you get the required driver on the image so that you don't need to exec into the container to do it manually.
See the comments below the docker file
FROM apache/nifi
USER root
RUN mkdir /lib/jdbc
WORKDIR /lib/jdbc
RUN wget https://download.microsoft.com/download/3/F/7/3F74A9B9-C5F0-43EA-A721-07DA590FD186/sqljdbc_6.2.2.0_enu.tar.gz
RUN tar xvzf sqljdbc_6.2.2.0_enu.tar.gz
RUN cp ./sqljdbc_6.2/enu/mssql-jdbc-6.2.2.jre8.jar ./
USER nifi
EXPOSE 8080 8443 10000 8000
WORKDIR ${NIFI_HOME}
ENTRYPOINT ["../scripts/start.sh"]
The above image uses apache/nifi as the base image. You can use any nifi docker image has a base if you would like.
You can specify any location for lib/jdbc, just remember that you need to use this as the reference for the file location so that it is referenced as file:///lib/jdbc/mssql-jdbc-6.2.2.jre8.jar
Lastly, switch back to the nifi user and finish off with the standard nifi image details. This will allow the image to run correctly.

Database migrations in docker swarm mode

I have an application that consists of simple Node app and Mongo db. I wonder, how could I run database migrations in docker swarm mode?
Without swarm mode I run migrations by stopping first the old version of application, running one-off migration command with new version of application and then finally starting a new version of app:
# Setup is roughly the following
$ docker network create appnet
$ docker run -d --name db --net appnet db:1
$ docker run -d --name app --net appnet -p 80:80 app:1
# Update process
$ docker stop app && docker rm app
$ docker run --rm --net appnet app:2 npm run migrate
$ docker run -d --name app --net appnet -p 80:80 app:2
Now I'm testing the setup in docker swarm mode so that I could easily scale app. The problem is that in swarm mode one can't start containers in swarm network and thus I can't reach the db to run migrations:
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
6jtmtihmrcjl appnet overlay swarm
# Trying to replicate the manual migration process in swarm mode
$ docker service scale app=0
$ docker run --rm --net appnet app:2 npm run migrate
docker: Error response from daemon: swarm-scoped network (appnet) is not compatible with `docker create` or `docker run`. This network can only be used by a docker service.
I don't want to run the migration command during app startup either, as there might be several instances launching and that would potentially screw the database. Automatic migrations are scary, so I want to avoid them at all costs.
Do you have any idea how to implement manual migration step in docker swarm mode?
Edit
I found out a dirty hack that allows to replicate the original workflow. Idea is to create a new service with custom command and remove it when one of its tasks is finished. This is far from pleasant usage, better alternatives are more than welcome!
$ docker service scale app=0
$ docker service create --name app-migrator --network appnet app:2 npm run migrate
# Check when the first app-migrator task is finished and check its output
$ docker service ps app-migrator
$ docker logs <container id from app-migrator>
$ docker service rm app-migrator
# Ready to update the app
$ docker service update --image app:2 --replicas 2 app
I believe you can fix this problem by making your overlay network, appnet, attachable. This can be accomplished with the following command:
docker network create --driver overlay --attachable appnet
This should fix the swarm-scoped network error and and allow you to run migrations
This is indeed tricky situation, though I think running the migration during startup might be the the final piece of the puzzle.
The way I do it right now (though not very elegant, it works), is using a message queue (I'm using redis), on app startup, it will send a message to a the queue, informing it that the migration task needs to be run. At the other end of the queue, I have a listener app that will process the queue and run the migration task. The migration task would only run once, since there's only a single instance of the listener running it sequentially. So essentially I'm just using the queue & the listener app to make sure that the migration task runs only once.

Binding database to app in cloudbees when deploying via jenkins

I have now managed to deploy a web app on run#cloud. I do have the cloudbees deployer plugin on jenkins, however, i am looking for a way to use the bees sdk to bind a database to the deployed app. I was wondering how do i go about it.
Currently, I deploy it via jenkins as a postbuild action.
You can configure the Bees SDK in DEV#cloud with a script like this (assuming that you have uploaded a build secret zip file containing your ~/.bees/bees.config using the environment variable ${SECRET} - please see Build Secret Plugin
Run this as an "Execute Shell" task within Jenkins, and then you can use the Bees SDK in the normal way to bind the database (or any resource) to your app, e.g.
bees app:bind -a acme/test -db mydb
See the Database Guide for more details.
Jenkins Execute Shell Script:
if [[ ! -d "${WORKSPACE}/bees-sdks" ]]
then
mkdir ${WORKSPACE}/bees-sdks
fi
cd ${WORKSPACE}/bees-sdks;
curl -o cloudbees-sdk-1.5.0-bin.zip http://cloudbees-downloads.s3.amazonaws.com/sdk/cloudbees-sdk-1.5.0-bin.zip;
unzip -o cloudbees-sdk-1.5.0-bin.zip
rm cloudbees-sdk-1.5.0-bin.zip
PATH=${WORKSPACE}/bees-sdks/cloudbees-sdk-1.5.0:$PATH; export PATH
if [[ ! -d ~/.bees ]]
then
mkdir ~/.bees
fi
cp ${SECRET}/bees.config ~/.bees/bees.config
I've done an online example here that illustrates how this all works. Sorry this is a little more complicated than we would like: we are working to make it smoother and I will update this answer shortly once the changes go live.

Resources