I used terraform to create Sagemaker notebook instance and deploy Jupyter notebook python script to create and deploy a regression model.
I was able to run the scribe and create the model successfully via AWS console manually. However, I could not find a way to get it executed automatically. I even tried executing the script via shell commands through notebook instance’s lifecycle configuration. However, it did not work as expected. Any other idea please?
Figured this out. Passed the below script to notebook instance as lifecycle configuration.
#!/bin/sh
sudo -u ec2-user -i <<'EOF'
source activate python3
pip install runipy
nohup runipy <<path_to_the_jupyter_notebook>> &
source deactivate
EOF
Related
I am unable to establish connection to my Oracle database from Azure Databricks although it works in ADF where I am able to query the table. But ADF takes time to filter the records so I am still trying to connect from Databricks.
I followed the steps from this Microsoft link, both manually and using init-script but error seems to persist.
When I looked into my cluster event log it says the init-script execution was successfully.
Error message when I tried to establish the connection:
DPI-1047: Cannot locate a 64-bit Oracle Client library: "/databricks/driver/oracle_ctl//lib/libclntsh.so: cannot open shared object file: No such file or directory".
When I executed the following command
dbutils.fs.ls("/databricks/driver/")
there was no such directory
This triggered me to post some questions here:
Does this mean the init-script did not perform its job?
Is /databricks/driver/oracle_ctl a hidden directory for dbutils.fs.ls?
Error message points to /databricks/driver/oracle_ctl//lib/libclntsh.so, when I manually inspected the downloaded oracle client, there is no such folder called lib although libclntsh.so exists in the main directory. Is there a problem that databricks is checking the wrong directory for the libclntsh.so?
Does this connections still works for others?
Syntax for connection: cx_Oracle.connect(user= user_name, password= password,dsn= IP+':'+Port+'/'+DB_name)
Above syntax works fine when connected from inside a on-premises machine.
Try installing the latest major release of cx_Oracle - which got renamed to python-oracledb, see the release announcement.
This version doesn't need Oracle Instant Client. The API is the same as cx_Oracle, although obviously the name is different.
If I understand the instructions, your init script would do something like:
/databricks/python/bin/pip install oracledb
Application code would be like:
import oracledb
connection = oracledb.connect(user='scott', password=mypw, dsn='yourdbhostname/yourdbservicename')
with connection.cursor() as cursor:
for row in cursor.execute('select city from locations'):
print(row)
Resources:
Home page: oracle.github.io/python-oracledb/
Quick start: Quick Start python-oracledb Installation
Documentation: python-oracle.readthedocs.io/en/latest/index.html
PyPI: pypi.org/project/oracledb/
Source: github.com/oracle/python-oracledb
Upgrading: Upgrading from cx_Oracle 8.3 to python-oracledb
Changed the path from "/databricks/driver/oracle_ctl/" to "/databricks/driver/oracle_ctl/instantclient" in the init-script and that error does not appear anymore.
Please use the following init script instead
dbutils.fs.put("dbfs:/databricks/<init-script-folder-name>/oracle_ctl.sh","""
#!/bin/bash
sudo apt-get install libaio1
wget --quiet -O /tmp/instantclient-basiclite-linuxx64.zip https://download.oracle.com/otn_software/linux/instantclient/instantclient-basiclite-linuxx64.zip
unzip /tmp/instantclient-basiclite-linuxx64.zip -d /databricks/driver/oracle_ctl/
mv /databricks/driver/oracle_ctl/instantclient* /databricks/driver/oracle_ctl/instantclient
sudo echo 'export LD_LIBRARY_PATH="/databricks/driver/oracle_ctl/instantclient/"' >> /databricks/spark/conf/spark-env.sh
sudo echo 'export ORACLE_HOME="/databricks/driver/oracle_ctl/instantclient/"' >> /databricks/spark/conf/spark-env.sh
""", True)
Notes:
The above init-script was advised by a databricks employee and can be found here.
As mentioned by Christopher Jones in one of the comments, cx_Oracle has been recently upgraded to oracledb with a thin and thick version.
You will get the above error if you don’t have Oracle instant client in your Cluster.
To resolve above error in azure databricks, please follow this code:
%sh
mkdir -p /opt/oracle
cd /opt/oracle
wget https://download.oracle.com/otn_software/nt/instantclient/19600/instantclient-basic-windows.x64-19.6.0.0.0dbru.zip
unzip instantclient-basic-windows.x64-19.6.0.0.0dbru.zip
set ORACLE_HOME=%ORABAS%\instantclient_19_3
set TNS_ADMIN=%ORACLE_HOME%
set PATH=%ORACLE_HOME%;%PATH%
To create init script, use the following code:
As per official doc,
dbutils.fs.put("dbfs:/databricks/<init-script-folder>/oracle_ctl.sh","""
#!/bin/bash
wget --quiet -O /tmp/instantclient-basiclite-linuxx64.zip https://download.oracle.com/otn_software/linux/instantclient/instantclient-basiclite-linuxx64.zip
unzip /tmp/instantclient-basiclite-linuxx64.zip -d /databricks/driver/oracle_ctl/
sudo echo 'export LD_LIBRARY_PATH="/databricks/driver/oracle_ctl/"' >> /databricks/spark/conf/spark-env.sh
sudo echo 'export ORACLE_HOME="/databricks/driver/oracle_ctl/"' >> /databricks/spark/conf/spark-env.sh
""", True)
To read data from oracle database in PySpark follow this article by Emrah Mete
For more information refer this official document:
https://docs.databricks.com/data/data-sources/oracle.html#oracle
We're running Flink in Cluster Session mode and automatically add Jars in the Dockerfile:
ADD pipeline-fat.jar /opt/flink/usrlib/pipeline-fat.jar
So that we can run this Jar via the Flink Rest API without the need to upload the Jar in advance:
POST http://localhost:8081/:jarid/run
But the "static" Jar is now shown, to get the :jarid:
GET http://localhost:8081/jars
So my question is:
Is it possible to run a userlib jar using the Flink Rest API?
Or can you only reference such jars via
CLI flink run -d -c ${JOB_CLASS_NAME} /job.jar
and standalone-job --job-classname com.job.ClassName Mode?
My alternative approach (workaround) would be to upload the jar in the Docker entrypoint.sh of the jobmanager container:
curl -X POST http://localhost:8084/jars/upload \
-H "Expect:" \
-F "jarfile=#./pipeline-fat.jar"
I believe that it is unfortunately not possible to currently start a flink cluster in session mode with a jar pre-baked in the docker image, and then start the job using the REST API commands (as you showed).
However your workaround approach seems like a good idea to me. I would be curious to see if it worked for you in practice.
I managed to run a userlib jar using the command line interface.
I edited docker compose to run custom docker-entrypoint.sh.
I've add to original docker-entrypoint.sh
run_user_jars() {
echo "Starting user jars"
exec ./bin/flink run /opt/flink/usrlib/my-job-0.1.jar & }
run_user_jars
...
And edit original entrypoint for jobmanager in docker-compose.yml file
entrypoint: ["bash", "/opt/flink/usrlib/custom-docker-entrypoint.sh"]
I am new to Debezium, Kafka, and Docker. I have successfully installed Docker and it is running on my locahost.
I am attempting to go through the Debezium tutorial at: https://github.com/debezium/debezium-examples/blob/master/tutorial/README.md#debezium-tutorial
I went to the section for SQL Server: and the first step says to # Start the topology as defined in https://debezium.io/docs/tutorial/. I successfully ran through that tutorial. But, it is for MySQL and not MSSQL Server. Anyways, I went back to the ../debezium-tutorial and the first line tells me to run:
export DEBEZIUM_VERSION=1.1
docker-compose -f docker-compose-sqlserver.yaml up
The tutorial does not discuss how to create the docker-compose-sqlserver.yaml. I checked Debezium's github site for this file and it is not there. Am I supposed to create this file manually or am I missing something in the steps?
In order to get Debezium to work, am I supposed to create and run a SQL Server instance in Docker, or can I use the instance that is running on my localhost?
The Docker Compose is included in the tutorials repository.
git clone https://github.com/debezium/debezium-examples.git
cd debezium-examples/tutorial
export DEBEZIUM_VERSION=1.1
docker-compose -f docker-compose-sqlserver.yaml up
I depolyed an app with gcloud preview app deploy.
Is there a way to download it to an other local machine?
How can I get the files? I tried it via ssh with no success (can't access the docker dir)
UPDATE:
I found this:
gcloud preview app modules download default --version 1 --output-dir=my_dir
but it's not loading files
Log
Downloading module [default] to [my_dir/default]
Fetching file list from server...
|- Downloading [0] files... -|
I am coming to Google App Engine after two years, I see that they have made lots of improvements and added tons of features. But sadly, their documentation sometimes leaves much to be desired.
I used to download my code of the uploaded version with the appcfg.pyusing the following command.
appcfg.py download_app -A <app_id> -V <version> <output-dir>
But of course now that they have culminated everything in the gcloud shell where appcfg.py is not accessible.
However, the following method helped me to download the deployed code:
Go the console and in to the Google App Engine.
Select the project you want to work with.
Once the project's dashboard opens, Click on the top right to
open the built in console window.
Which should load the cloud shell at the bottom, now if you check appcfg.py is available to you to use in this VM.
Hence, use appcfg.py download_app -A <app_id> -V <version> <output-dir> to download the code.
Now once you have the code in the desired folder, in order to download it on your local machine - You can open the docker code editor
Now here I assumed if I rightclicked and exported the desired
folder it would work,
but instead it gave me the following error message.
{"Error":"'concurrency' must be a number but it is [object Undefined]","Message":"'concurrency' must be a number but it is [object Undefined]"}
So, I thought maybe it would play along nicely if the the folder
was an archive. Go back to the cloud shell and using whatever
utility you fancy make an archive of the folder
zip -r mycode.zip mycode
Go to the docker code editor, export and download.
Now. Of course there might many more ways do it (hopefully) but this is what made sense to me after returning to Google App Engine after 2 years.
Currently, the best way to do this is to pull the files out of Docker.
Put instance into self-managed mode, so that you can ssh into it:
$ gcloud preview app modules set-managed-by default --version 1 --self
Find the name of the instance:
$ gcloud compute instances list | grep gae-default-1
Copy it out of the Docker container, change the permissions, and copy it back to your local machine:
$ gcloud compute ssh --zone=us-central1-f gae-default-1-1234 'sudo docker cp gaeapp:/app /tmp'
$ gcloud compute ssh --zone=us-central1-f gae-default-1-1234 "chown -R $USER /tmp/app"
$ gcloud compute copy-files --zone=us-central1-f gae-default-1-1234:/tmp/app /tmp/
$ ls /tmp/app
Dockerfile
[...]
IMHO, the best option today (Aug 2018) is:
Under the main menu, under Products, go to Tools -> Cloud Build -> Build history.
There, click the ID of the build you want.
Then, in the opened window (Build details), click the source link, the download of your compressed code begins.
As simple as that.
HTH.
As of Feb 2021, you can install appengine-sdk using pip
pip install appengine-sdk
Once installed, appcfg can be used to download the app code.
python -m appcfg download_app -A app_id [ -V version ] out-dir
Nothing works. Finally I found the source code this way. Simply go to google cloud storage. choose buckets starting with us.artifacts...., select containers > images > download the latest one (look by created date). unzip after downloaded file. it will have all the deployed source code of app engine.
I have now managed to deploy a web app on run#cloud. I do have the cloudbees deployer plugin on jenkins, however, i am looking for a way to use the bees sdk to bind a database to the deployed app. I was wondering how do i go about it.
Currently, I deploy it via jenkins as a postbuild action.
You can configure the Bees SDK in DEV#cloud with a script like this (assuming that you have uploaded a build secret zip file containing your ~/.bees/bees.config using the environment variable ${SECRET} - please see Build Secret Plugin
Run this as an "Execute Shell" task within Jenkins, and then you can use the Bees SDK in the normal way to bind the database (or any resource) to your app, e.g.
bees app:bind -a acme/test -db mydb
See the Database Guide for more details.
Jenkins Execute Shell Script:
if [[ ! -d "${WORKSPACE}/bees-sdks" ]]
then
mkdir ${WORKSPACE}/bees-sdks
fi
cd ${WORKSPACE}/bees-sdks;
curl -o cloudbees-sdk-1.5.0-bin.zip http://cloudbees-downloads.s3.amazonaws.com/sdk/cloudbees-sdk-1.5.0-bin.zip;
unzip -o cloudbees-sdk-1.5.0-bin.zip
rm cloudbees-sdk-1.5.0-bin.zip
PATH=${WORKSPACE}/bees-sdks/cloudbees-sdk-1.5.0:$PATH; export PATH
if [[ ! -d ~/.bees ]]
then
mkdir ~/.bees
fi
cp ${SECRET}/bees.config ~/.bees/bees.config
I've done an online example here that illustrates how this all works. Sorry this is a little more complicated than we would like: we are working to make it smoother and I will update this answer shortly once the changes go live.