connect SQL to apache nifi - sql-server

I'm new to nifi and i want to connect SQL server database to nifi and create a data flow with the processors. how can I do this, can any one Help me with this clearly.
Thanks in Advance
sam

Here are two great articles on getting information in and out of databases with NiFi:
http://www.batchiq.com/database-injest-with-nifi.html
http://www.batchiq.com/database-extract-with-nifi.html
They describe/illustrate how to configure a DBCPConnectionPool service to provide connection(s) to an RDBMS, and example flows to extract data and ingest data.

Expanding on mattyb answer
If you are using the latest Hortonworks sandbox, or other setup that uses docker containers, read below.
You have to install the JDBC jar file inside the docker. For SQL Server, it should be 6.2 or above.
docker ps
docker exec -it <mycontainer uuid> bash
How do I get into a Docker container's shell?
will help you log into the container.
cd file:///usr/lib/jvm/jre/lib/
mkdir jdbc
cd ./jdbc
wget https://download.microsoft.com/download/3/F/7/3F74A9B9-C5F0-43EA-A721-07DA590FD186/sqljdbc_6.2.2.0_enu.tar.gz
tar xvzf sqljdbc_6.2.2.0_enu.tar.gz
cp ./sqljdbc_6.2/enu/mssql-jdbc-6.2.2.jre8.jar ./
jdbc:sqlserver://192.168.1.201:1433;databaseName=[your database]
com.microsoft.sqlserver.jdbc.SQLServerDriver
You might need to replace the ip address with IPv4 address of your host found under ipconfig in Windows or ifconfig in Mac/Linux.
You may change file:///usr/lib/jvm/jre/lib/ to any path you desire.

Expanding on TamusJRoyce's answer
If you are running nifi via a docker image like apache/nifi or the aforementioned Hortonworks sandbox, the following should help you get the required driver on the image so that you don't need to exec into the container to do it manually.
See the comments below the docker file
FROM apache/nifi
USER root
RUN mkdir /lib/jdbc
WORKDIR /lib/jdbc
RUN wget https://download.microsoft.com/download/3/F/7/3F74A9B9-C5F0-43EA-A721-07DA590FD186/sqljdbc_6.2.2.0_enu.tar.gz
RUN tar xvzf sqljdbc_6.2.2.0_enu.tar.gz
RUN cp ./sqljdbc_6.2/enu/mssql-jdbc-6.2.2.jre8.jar ./
USER nifi
EXPOSE 8080 8443 10000 8000
WORKDIR ${NIFI_HOME}
ENTRYPOINT ["../scripts/start.sh"]
The above image uses apache/nifi as the base image. You can use any nifi docker image has a base if you would like.
You can specify any location for lib/jdbc, just remember that you need to use this as the reference for the file location so that it is referenced as file:///lib/jdbc/mssql-jdbc-6.2.2.jre8.jar
Lastly, switch back to the nifi user and finish off with the standard nifi image details. This will allow the image to run correctly.

Related

How to connect superset to postgresql - The port is closed

My operating system is Linux.
I am going to connect Superset to PostgreSQL.
PostgreSQL port is open and its value is 5432.
PostgreSQL is also running and not closed.
Unfortunately, after a day of research on the Internet, I could not solve the problem and it gives the following error:
The port is closed.
Database port:
command: lsof -i TCP:5432
python3 13127 user 13u IPv4 279806 0t0 TCP localhost:40166->localhost:postgresql (ESTABLISHED)
python3 13127 user 14u IPv4 274261 0t0 TCP localhost:38814->localhost:postgresql (ESTABLISHED)
Please help me, I am a beginner, but I searched a lot and did not get any results.
Since you're running Superset in a docker container, you can't use 127.0.0.1 nor localhost, since they resolve to the container, not the host. For the host, use host.docker.internal
I had a similar problem using docker compose. Port is closed can be due to networking problem. Host.docker.internal doesn’t worked for me on Ubuntu 22. I would like to recommend to not follow official doc and use better approach with single docker image to start. Instead of running 5 containers by compose, run everything in one. Use official docker image, here image. Than modify docker file as follows to install custom db driver:
FROM apache/superset
USER root
RUN pip install mysqlclient
RUN pip install sqlalchemy-redshift
USER superset
Second step is to build new image based on docker file description. To avoid networking problems start both containers on same network (superset, your db) easier is to use host network. I used this on Google cloud example as follow:
docker run -d --network host --name superset supers
The same command to start container with your database. —network host. This solved my problems. More about in whole step to step tutorial: medium or here blog
From the configuration file, you set port 5432, but it does not mean that your pg service is available

Access a database backup-file on mac? (.bak)

I'm running a localhost database through Docker on MAC. I have an assignment that requires me to hand in the .bak file along with the program I wrote. I'm using Azure Data Studio as DBMS. I can't find these anywhere and I've tried to google the matter but it doesn't seem as a common issue for other mac users.
How do i access these from Finder? Or is there another way to do this?
Accessing Docker Container File system from Mac OS host through this tutorial.
To access the file system of a particular Container, first, let us get the Container ID using the inspect command on the Docker Host.
docker inspect --format <Container Name>
Use the Alpine Docker image and mount your Host file system to the container
docker run --rm -it -v /:/vm-root alpine:edge sh
We need the ID of this container. So, you could combine the step 1 and 2 with the following
docker run --rm -it -e CONTAINER_ID=$(docker inspect --format <Container Name>) -v /:/vm-root alpine:edge sh
Now we have the CONTAINER_ID set as an environment variable in the alpine container.
Once you are in the alpine container, you can visit the following directory
cd /vm-root/var/lib/docker
Inside this directory, you will be able to access all the familiar files that you are used to when administering Docker
Now, we need to find the mount-id for the selected container to access the file system directories. We will use the CONTAINER_ID environment variable obtained in step 2. I have AUFS as my file system driver for the this example. To do that, use the following command.
MOUNT_ID=$(cat /vm-root/var/lib/docker/image/aufs/layerdb/mounts/$CONTAINER_ID/mount-id)
The above step will give you the mount-id. Now you can access the file system of the container under mnt directory with the mount-id
ls -ltr /vm-root/var/lib/docker/aufs/mnt/$MOUNT_ID

Connect to docker sqlserver via ssh

I've created a docker container that contains a mssql Database. On the command line ip a gives an ip address for the container, however trying to ssh into it username#docker_ip_address yields ssh: connect to host ip_address port 22: Connection refused. So I'm wondering if I am even able to ssh into the container so I don't have to always be using the docker tool docker exec .... and if so how would I go about doing that?
To ssh into container you should full-fill followings
SSH server(Openssh) should be installed within the container and ssh service should be running
Port 22 should be published from container (when you run the container).more info here > Publish ports on Docker
docker ps command should display mapped ports 22
Hope above information helps for you to understand the situation...
If your container contains a database server, the normal way to interact with will be through an SQL client that connects to it; Google suggests SQL Server Management Studio and that connector libraries exist for popular languages. I'm not clear what you would do given a shell in the container, and my main recommendation here would be to focus on working with the server in the normal way.
Docker containers normally run a single process, and that's normally the main server process. In this case, the container runs only SQL Server. As some other answers here suggest, you'd need to significantly rearchitect the container to even have it be possible to run an ssh daemon, at which point you need to worry about a bunch of other things like ssh host keys and user accounts and passwords that a typical Docker image doesn't think about at all.
Also note that the Docker-internal IP address (what you got from ip addr; what docker inspect might tell you) is essentially useless. There are always better ways to reach a container (using inter-container DNS to communicate between containers; using the host's IP address or DNS name to reach published ports from the same or other hosts).
Basically, alter your Dockerfile to something like the following - that will install openssh-server, alter a prohibitive default configs and start the service:
# FROM a-image-with-mssql
RUN echo "root:toor" | chpasswd
RUN apt-get update
RUN apt-get install -y openssh-server
COPY entrypoint.sh .
RUN cd /;wget https://gist.githubusercontent.com/spekulant/e04521d6c6e1ccffbd3455c673518c5b/raw/1e4f6f2cb32caf3a4a9f73b02efdcbd5dde4ba7a/sshd_config
RUN rm /etc/ssh/sshd_config; cp sshd_config /etc/ssh/
ENTRYPOINT ["./entrypoint.sh"]
# further commands
Now you've got yourself an image with ssh server inside, all you have to do is start the service, you cant do RUN service ssh start because it won't work - docker specifics, refer to the documentation. You have to use a Entrypoint like the following:
#!/bin/bash
set -e
sh -c 'service ssh start'
exec "$#"
Put it in a file entrypoint.sh next to your Dockerfile - remember to chmod 755 entrypoint.sh it. There's one thing to mention here, you still wouldn't be able to ssh into the container - the default SSH server configuration doesn't allow login into root account using a password. So you either change the configs yourself and provide it to the image, or you can trust me and use the file I created - inspect it with the link from Dockerfile - nothing malicious there, only a change from prohibit-password to yes.
Fortunately for us - MSSQL official images start from Ubuntu so all the commands above fit perfectly into the environment.
Edit
Be sure to ask if something is unclear or I'm jumping too fast.

I can't locate my host directory, which I attached to Docker

I have a slight problem, I've been trying to get some persistent data with ArangoDB and Docker. I passed the argument to Docker which attaches the hosts directory to a path within Docker. It's all fine to this point, but I'm stuck in an enigma where the hell this directory is.
1) This is a sample command which resembles mine:
docker run -e ARANGO_ROOT_PASSWORD='mypass' -p 80:8529 -d -v myhostfolder:/var/lib/arangodb3 arangodb
So, the problem is i can't find myhostfolder anywhere on my host machine which runs docker. The data within it is persistent and I can access it, but only through the docker container. I think that the data is somewhere on my host machine, I've been trying to pass a couple of these "relative" folders and they all keep persistent data so I doubt that the data is in the actual docker container.
2) If I do something like this (providing an absolute path)
docker run -e ARANGO_ROOT_PASSWORD='mypass' -p 80:8529 -d -v /home/myhostfolder:/var/lib/arangodb3 arangodb
then I have no issues with locating the /home/myhostfolder.
So my question is, where on my OS X 10.12 is the myhost folder from example 1)?
Thanks for your help!
The host-dir can either be an absolute path or a name value. If you supply an absolute path for the host-dir, Docker bind-mounts to the path you specify. If you supply a name, Docker creates a named volume by that name.
Refer https://docs.docker.com/engine/tutorials/dockervolumes/#mount-a-host-directory-as-a-data-volume.
In your case, as myhostfolder is a name, docker creates a named volume. Execute the below command which lists the volumes. A volume with name myhostfolder will be shown.
docker volume ls

How to copy files from local machine to docker container on windows

I have to import data files from a user local file C:/users/saad/bdd to a docker container (cassandra), I didn't find how to proceed using docker commands.
I'm working on windows 7.
Use docker cp.
docker cp c:\path\to\local\file container_name:/path/to/target/dir/
If you don't know what's the name of the container, you can find it using:
docker ps --format "{{.Names}}"
When using docker toolbox, there seems to be another issue related to absolute paths.
I am communicating with the containers using the "Docker Quickstart Teminal" which essentially is a MINGW64 environment.
If I try to copy a file with an absolute path to a container, I receive the error message.
$ docker cp /d/Temp/my-super-file.txt container-name:/tmp/
copying between containers is not supported
If I use a relative path, it simply works.
$ cd /d/
$ docker cp Temp/my-super-file.txt container-name:/tmp/
P.S.: I am posting this as an answer because of missing reputation for a comment.
Simple way:
From DockerContainer To LocalMachine
$docker cp containerId:/sourceFilePath/someFile.txt C:/localMachineDestinationFolder
From LocalMachine To DockerContainer
$docker cp C:/localMachineSourceFolder/someFile.txt containerId:/containerDestinationFolder
It is not as straight-forward when using docker toolbox. Because docker toolbox has only access to C:\Users\ folder and there is a Oracle Virtual Box Manager in between, when you do get to copy the folder it is not directly copied to the container but instead to a mounted volume handle by Oracle VM machine. Like so:
/mnt/sda1/var/lib/docker/volumes/19b65e5d9f607607441818d3923e5133c9a96cc91206be1239059400fa317611/_data
How I got around this is just editing my DockerFile:
FROM cassandra:latest
ADD cassandra.yml /etc/cassandra/
ADD import.csv /var/lib/cassandra/
EXPOSE 9042
And building it.
If you are using docker-toolbox on windows, use the following syntax
docker cp /C/Users/Saad/bdd-restaurants cassandra:/var/lib/docker/containers
Use this command will help to copy files from host machine to docker container.
docker cp c:\abc.doc <containerid> :C:\inetpub\wwwroot\abc.doc
if you are trying to copy file from windows to an EC2 instance use the following in cmd (Putty enabled):
pscp -i "D:\path_to_ppk_key" c:\file_name ubuntu#**.***.**.*:/home/ubuntu/file
Then you can copy to docker in EC2 using
docker cp /home/ubuntu/file_name Docker_name:/home/
For those who are using WSL (Windows Subsystem for Linux), Docker, and DevContainers from VSCode (Visual Studio Code), I was able to make this work by using the WSL command line.
docker cp "/mnt/<drive letter>/source/My First Copy Command" <container id>:/workspace/destination/path
I also wrote it up in more detail.
You can also use volume to mount the file to container on the run:
docker run -v /users/saad/bdd:/myfiles/tmp/

Resources