cx_Oracle in Azure Databricks - database

I am unable to establish connection to my Oracle database from Azure Databricks although it works in ADF where I am able to query the table. But ADF takes time to filter the records so I am still trying to connect from Databricks.
I followed the steps from this Microsoft link, both manually and using init-script but error seems to persist.
When I looked into my cluster event log it says the init-script execution was successfully.
Error message when I tried to establish the connection:
DPI-1047: Cannot locate a 64-bit Oracle Client library: "/databricks/driver/oracle_ctl//lib/libclntsh.so: cannot open shared object file: No such file or directory".
When I executed the following command
dbutils.fs.ls("/databricks/driver/")
there was no such directory
This triggered me to post some questions here:
Does this mean the init-script did not perform its job?
Is /databricks/driver/oracle_ctl a hidden directory for dbutils.fs.ls?
Error message points to /databricks/driver/oracle_ctl//lib/libclntsh.so, when I manually inspected the downloaded oracle client, there is no such folder called lib although libclntsh.so exists in the main directory. Is there a problem that databricks is checking the wrong directory for the libclntsh.so?
Does this connections still works for others?
Syntax for connection: cx_Oracle.connect(user= user_name, password= password,dsn= IP+':'+Port+'/'+DB_name)
Above syntax works fine when connected from inside a on-premises machine.

Try installing the latest major release of cx_Oracle - which got renamed to python-oracledb, see the release announcement.
This version doesn't need Oracle Instant Client. The API is the same as cx_Oracle, although obviously the name is different.
If I understand the instructions, your init script would do something like:
/databricks/python/bin/pip install oracledb
Application code would be like:
import oracledb
connection = oracledb.connect(user='scott', password=mypw, dsn='yourdbhostname/yourdbservicename')
with connection.cursor() as cursor:
for row in cursor.execute('select city from locations'):
print(row)
Resources:
Home page: oracle.github.io/python-oracledb/
Quick start: Quick Start python-oracledb Installation
Documentation: python-oracle.readthedocs.io/en/latest/index.html
PyPI: pypi.org/project/oracledb/
Source: github.com/oracle/python-oracledb
Upgrading: Upgrading from cx_Oracle 8.3 to python-oracledb

Changed the path from "/databricks/driver/oracle_ctl/" to "/databricks/driver/oracle_ctl/instantclient" in the init-script and that error does not appear anymore.
Please use the following init script instead
dbutils.fs.put("dbfs:/databricks/<init-script-folder-name>/oracle_ctl.sh","""
#!/bin/bash
sudo apt-get install libaio1
wget --quiet -O /tmp/instantclient-basiclite-linuxx64.zip https://download.oracle.com/otn_software/linux/instantclient/instantclient-basiclite-linuxx64.zip
unzip /tmp/instantclient-basiclite-linuxx64.zip -d /databricks/driver/oracle_ctl/
mv /databricks/driver/oracle_ctl/instantclient* /databricks/driver/oracle_ctl/instantclient
sudo echo 'export LD_LIBRARY_PATH="/databricks/driver/oracle_ctl/instantclient/"' >> /databricks/spark/conf/spark-env.sh
sudo echo 'export ORACLE_HOME="/databricks/driver/oracle_ctl/instantclient/"' >> /databricks/spark/conf/spark-env.sh
""", True)
Notes:
The above init-script was advised by a databricks employee and can be found here.
As mentioned by Christopher Jones in one of the comments, cx_Oracle has been recently upgraded to oracledb with a thin and thick version.

You will get the above error if you don’t have Oracle instant client in your Cluster.
To resolve above error in azure databricks, please follow this code:
%sh
mkdir -p /opt/oracle
cd /opt/oracle
wget https://download.oracle.com/otn_software/nt/instantclient/19600/instantclient-basic-windows.x64-19.6.0.0.0dbru.zip
unzip instantclient-basic-windows.x64-19.6.0.0.0dbru.zip
set ORACLE_HOME=%ORABAS%\instantclient_19_3
set TNS_ADMIN=%ORACLE_HOME%
set PATH=%ORACLE_HOME%;%PATH%
To create init script, use the following code:
As per official doc,
dbutils.fs.put("dbfs:/databricks/<init-script-folder>/oracle_ctl.sh","""
#!/bin/bash
wget --quiet -O /tmp/instantclient-basiclite-linuxx64.zip https://download.oracle.com/otn_software/linux/instantclient/instantclient-basiclite-linuxx64.zip
unzip /tmp/instantclient-basiclite-linuxx64.zip -d /databricks/driver/oracle_ctl/
sudo echo 'export LD_LIBRARY_PATH="/databricks/driver/oracle_ctl/"' >> /databricks/spark/conf/spark-env.sh
sudo echo 'export ORACLE_HOME="/databricks/driver/oracle_ctl/"' >> /databricks/spark/conf/spark-env.sh
""", True)
To read data from oracle database in PySpark follow this article by Emrah Mete
For more information refer this official document:
https://docs.databricks.com/data/data-sources/oracle.html#oracle

Related

"bash: sed: command not found" RHEL7 when installing SQL Server

NEED HELP - LINUX NEWBIE - BROKEN LINUX RHEL7
While installing the mssql-tools on RHEL7 and following instructions to add the path for the mssql-tools, we added the following commands:
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
source ~/.bashrc
After rebuilding the server for a second time, I've reached this Microsoft step again. As it is the cause for my broken copy of RHEL7, having a way to do this properly and not incorrectly as listed above would be helpful. Most articles list about 50 ways to do this. I'm looking for one way only and it will be nice if the one way actually works and I don't spend another part of my life rebuilding the server again.
https://learn.microsoft.com/en-us/sql/linux/quickstart-install-connect-red-hat?view=sql-server-ver15
I created an issue in Github for this failing.
https://github.com/MicrosoftDocs/sql-docs/issues/7288
The article listed here seems to do the trick
https://dba.stackexchange.com/questions/174277/getting-sqlcmd-sqlcmd-command-not-found-in-linux
Long and short is the PATH should work as a join if it works like it does in Windows but I wound up using
sudo ln -s /opt/mssql-tools/bin/* /usr/local/bin/
to join that location with the user bin. There is probably more to it since security is always number one, but at least I was able to complete the installation of SQL Server on the RHEL7 release.

AIP backup - using Docker

I am using the cloned dspace 6-x branch and installed it via docker. Can someone help me with the backup of my local database (Communities, collections, items)to a remote database?
According to the documentation we need to use the command:
dspace packager -s -t AIP -e eperson -p parent-handle file-path
But it returns an error: dspace is not a command
Anyone could help me transfer my local database to my remote repo?
Thanks!
Moving publications to a new repository will be a more substantial undertaking!
But your recent problem seems just that you are either not on the right container or in the right directory for executing the dspace command. Thus it is "not found". Make sure to execute dspace on the dspace container and specify the right/complete path. The dspace command is located in
/path/to/your/dspace-deployement-directory/bin.

postgreSQL error initdb: command not found

i was installing postgresql on ubuntu using linuxbrew:
brew install postgresql
it seems to work fine but after that because i was installing PostgreSQL for the first time i tried creating a database:
initdb /usr/local/var/postgres -E utf8
but it returned as:
initdb: command not found
i tried running the command with sudo but that doesn't helped
run locate initdb it should give you the list to chose. smth like:
MacBook-Air:~ vao$ locate initdb
/usr/local/Cellar/postgresql/9.5.3/bin/initdb
/usr/local/Cellar/postgresql/9.5.3/share/doc/postgresql/html/app-initdb.html
/usr/local/Cellar/postgresql/9.5.3/share/man/man1/initdb.1
/usr/local/Cellar/postgresql/9.6.1/bin/initdb
/usr/local/Cellar/postgresql/9.6.1/share/doc/postgresql/html/app-initdb.html
/usr/local/Cellar/postgresql/9.6.1/share/man/man1/initdb.1
/usr/local/bin/initdb
/usr/local/share/man/man1/initdb.1
So in my case I want to run
/usr/local/Cellar/postgresql/9.6.1/bin/initdb
If you don't have mlocate installed, either install it or use
sudo find / -name initdb
There's a good answer to a similar question on SuperUser.
In short:
Postgres groups databases into "clusters", each of which is a named collection of databases sharing a configuration and data location, and running on a single server instance with its own TCP port.
If you only want a single instance of Postgres, the installation includes a cluster named "main", so you don't need to run initdb to create one.
If you do need multiple clusters, then the Postgres packages for Debian and Ubuntu provide a different command pg_createcluster to be used instead of initdb, with the latter not included in PATH so as to discourage end users from using it directly.
And if you're just trying to create a database, not a database cluster, use the createdb command instead.
I had the same problem and found the answer here.
Ubuntu path is
/usr/lib/postgresql/9.6/bin/initdb
Edit: Sorry, Ahmed asked about linuxbrew, I'm talking about Ubuntu.
I Hope this answer helps somebody.
I had a similar issue caused by the brew install postgresql not properly linking postgres. The solve for me was to run:
brew link --overwrite postgresql
you can add the PATH to run from any location
sudo nano ~/.profile
inside nano go to the end and add the following
# set PATH so it includes user's private bin if it exists
if [ -d "/usr/lib/postgresql/14/bin/" ] ; then
PATH="/usr/lib/postgresql/14/bin/:$PATH"
fi
and configure the alternative
sudo update-alternatives --install /usr/bin/initdb initdb /usr/lib/postgresql/14/bin/initdb 1

Deploy the database to Docker Container microsoft/mssql-server-linux

I have a database running on SQL Server (13.01) on Windows. I like to deploy it to the Docker Container on Linux using SSDT.
I can perfectly connect to the server running on Docker and create/drop database manually and play with the data.
The problem is I can not publish it. I'm executing following script on Powershell
PS: SqlPackage.exe /Action:Publish /SourceFile:"d.dacpac" /TargetConnectionString:"server=containeraddress;database=thedatabase;user id=sa;password=thepassword;
and getting the following error.
Unable to connect to master or target server 'the database'. You must have a user with the same password in master or target server 'the database'. (Microsoft.Data.Tools.Schema.Sql)
I have the same user and same password on target and source servers.
Is there anybody has the same problem and know how to solve it?
I'll post this here as most of the answers refer to having an existing compiled dacpac file, which may not always be possible. I haven't seen similar ideas posted elsewhere to the solution I'm suggesting here.
Given your usage of docker and if you wish to compile your visual studio project inside the container, given certain combinations of the container base OS and image may not be possible to create a dacpac file with msbuild.
You can work around restoring the database using a series of unix based commands, taking note that the visual studio database project is usually just a series of SQL files, below I show an example of this, where I concat the SQL files into a single file and call sqlcmd to run the script;
FROM mcr.microsoft.com/mssql/server
WORKDIR /init
ENV ACCEPT_EULA=Y
ENV MSSQL_SA_PASSWORD=MyPassword
EXPOSE 1433:1433
RUN apt-get update && apt-get install dos2unix
COPY /solution_folder/database/Tables/*.sql /init/
WORKDIR /database
RUN echo "CREATE DATABASE [database_name];\nGO\nUSE [database_name];\n” >> /database/create.sql
RUN for f in /init/*.sql; do dos2unix $f; cat $f >> /database/create.sql; echo "\nGO\n" >> /database/create.sql; done
RUN ( /opt/mssql/bin/sqlservr --accept-eula & ) | grep -q "Service Broker manager has started" && /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P ‘MyPassword’ -i /database/create.sql && pkill sqlservr
The reason for "dos2unix" is that the SQL files created within visual studio have unique hidden cr/lf (and other characters) which the linux version of sqlcmd won't interpret successfully and will cause errors (which is kind of bizarre and this is exactly the kind of thing you'd want a cross platform database to be able to cope with)
Also, within the final run command you have to start-up the sql server service temporarily otherwise you'll also get errors; it's a little bit of work-around, and a bit fiddly and I'm not sure entirely that the microsoft sql server linux container is well designed enough for the simple task of restoring a database like this, the nuances are the differences between building and running a container and needing some sort of happy middle ground of both concepts for it to work.
Given here isn't a complete solution to restore, it only deals with Tables from the project file, although it should be trivial to expand to scalar functions and stored procedures.
Which version of SqlPackage.exe are you using? Only the most recent release candidate versions of SqlPackage.exe support SQL Server vNext CTP. The SqlPackage release candidate can be downloaded here: https://www.microsoft.com/en-us/download/details.aspx?id=54273

How to update database with script

I need to create Postgre DB in Windows. So I've downloaded the windows tools from the official site, created server and the problem appeared. When I try to create db in pgAdmin III I'm getting syntax errors while copying data. So I need to run the whole thing in the console. But pgAdmin only allows console mode for db already created and when I run it I have shell for my empty database:
dbname->
How can I now run my script ?
Have you tried using psql.exe? Given psql.exe is on your path try:
c:\> psql.exe -h localhost -U username -f c:\mysqlscript.sql database_name
In windows try this:
\i c:/somedir/script2.sql

Resources