Lambda Layer Function for Snowflake - snowflake-cloud-data-platform

I have followed the standard method for creating a Lambda Layer in Snowflake.
[in a AWS EC2 instance]
rm -rf snowflake
mkdir -p snowflake/python/lib/python3.7/site-packages
pip3 install --no-cache-dir --ignore-installed --upgrade snowflake-connector-python -t snowflake/python/lib/python3.7/site-packages
cd snowflake; rm -f snowflake.zip; zip -r snowflake.zip .
I can create the Lambda Layer and then add it to my Lambda function. As well as validate that the library is attached, but when I call the Lambda function, it is failing on
import snowflake.connector
[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'snowflake'
Are there any snowflake Lambda libraries or detailed guides on what may be going wrong here. I build lambda layers all the time in this manner and usually the above process works.

The issue turned out to be a requirement on 3.7. I added the following to my library creation and it worked fine:
I changed the default python version in my Lambda function to be 3.7 to match the python instance I used to compile the library, then I added the following into my code:
ssh -i ${PEM_FILE} ${EC2_HOST} "cp -r snowflake/python/lib/python3.7/site-packages/* dblayer/python/lib/python3.7/site-packages"

Related

How to completely download Anaconda Cloud bz2 files and dependencies for offline package installation? [duplicate]

I want to create a Python environment with the data science libraries NumPy, Pandas, Pytorch, and Hugging Face transformers. I use miniconda to create the environment and download and install the libraries. There is a flag in conda install, --download-only to download the required packages without installing them and install them afterwards from a local directory. Even when conda just downloads the packages without installing them, it also extracts them.
Is it possible to download the packages without extracting them and extract them afterwards before installation?
There is no simple command in the CLI to prevent the extraction step. The extraction is regarded as part of the FETCH operation to populate the package cache before running the LINK operation to transfer the package to the specified environment.
The alternative would be to do something manually. Naively, one could search Anaconda Cloud and manually download, however, it would probably be better to go through the solver to ensure package compatibility. All the info for operations to be run can be viewed by including the --json flag. This could be filtered to just the tarball URLs and then downloaded directly. Here's a script along these lines (assuming Linux/Unix):
File: conda-download.sh
#!/bin/bash -l
conda create -dn null --json "$#" |\
grep '"url"' | grep -oE 'https[^"]+' |\
xargs wget -c
which can be used as
./conda-download.sh -c conda-forge -c pytorch numpy pandas pytorch transformers
that is, it accepts all arguments conda create would, and will download all the tarballs locally.
Ignoring Cached Packages
If you already have some packages cached then the above will not redownload them. Instead, if you wish to download all tarballs needed for an environment, then you could use this alternate version which overrides the package cache using an empty temporary directory:
File: conda-download-all.sh
#!/bin/bash -l
tmp_dir=$(mktemp -d)
CONDA_PKGS_DIRS=$tmp_dir conda create -dn null --json "$#" |\
grep '"url"' | grep -oE 'https[^"]+' |\
xargs wget -c
rm -r $tmp_dir
Do you really want to use conda-pack? That lets you archive a conda-environment for reproducing without using the internet or re-solving for dependencies. To just prevent re-solving you can also use conda env export --explict but that still ties you to the source (internet or local disk repository).
If you have a static environment (read-only) and want to really reduce docker size, you can volume mount the environment at runtime. You would need to match the file paths (ie: /opt/anaconda => /opt/anaconda).

How we can execute a Jupyter notebook python script automatically in Sagemaker?

I used terraform to create Sagemaker notebook instance and deploy Jupyter notebook python script to create and deploy a regression model.
I was able to run the scribe and create the model successfully via AWS console manually. However, I could not find a way to get it executed automatically. I even tried executing the script via shell commands through notebook instance’s lifecycle configuration. However, it did not work as expected. Any other idea please?
Figured this out. Passed the below script to notebook instance as lifecycle configuration.
#!/bin/sh
sudo -u ec2-user -i <<'EOF'
source activate python3
pip install runipy
nohup runipy <<path_to_the_jupyter_notebook>> &
source deactivate
EOF

Create a custom AWS Sagemaker Estimator with a configurable entrypoint

I'm writing a custom Estimator in AWS Sagemaker , for a framework that isn't supported out-of-the-box. I have my own docker image for training, with the training code bundled into the image, which forces me to rebuild the image every time the code changes.
What I would like to do, is create an Estimator that uses this image, and accepts a file as an entry-point, like the built-in framework estimators do (e.g Tensorflow).
From reading the source code for the Sagemaker python SDK, I found the sagemaker.estimator.Framework class, which accepts the entry_point argument, and from which the built-in frameworks estimators inherit. However, the documentation doesn't really show how to inherit from that class in my own code.
Is it possible to write a custom Estimator class that inherits from Framework, or is there another way to create a custom estimator that receives an entry-point argument?
Instead of using an Estimator, use a TensorFlow estimator and then pass your custom image as an input to image_uri. This way you have the best of the both worlds. And if you have a requirements.txt file in your source_dir, that gets installed before training.
Here is a custom image docker file
FROM python:3.9-buster
RUN apt-get -y update && apt-get install -y --no-install-recommends \
wget \
python3-pip \
python3-setuptools \
nginx \
ca-certificates \
vim \
&& rm -rf /var/lib/apt/lists/*
#RUN ln -s /usr/bin/python3 /usr/bin/python
RUN ln -s /usr/bin/pip3 /usr/bin/pip
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
RUN pip install sagemaker-training
and here is how you call your custom image
tf_estimator = TensorFlow(entry_point='mnist_keras_tf.py',
source_dir='.',
role=role,
instance_count=1,
instance_type='ml.p3.2xlarge',
image_uri='<arn of custom image>',
script_mode=True,
hyperparameters={
'epochs': 2,
'batch-size': 256,
'learning-rate': 0.01}
The existing framework estimators in the SageMaker Python SDK can be a good place to start, e.g. https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/pytorch/estimator.py
However, it would probably be easier, since you've already bundled your training code into the image, to set an entrypoint in your Dockerfile that runs the training code. For more about how Amazon SageMaker runs your training image, see https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-dockerfile.html

Google App Engine PHP SDK - How to install on Ubuntu (15.10)?

Google official documentation is available here:
https://cloud.google.com/appengine/downloads#Google_App_Engine_SDK_for_PHP
But it doesn't provide sufficient information about the following step:
"4 - Build and install the PHP interpreter and App Engine PHP extension. Specify the path to php-cgi and gae_runtime_module.so when running the development server."
I'm using a new Virtualbox machine with Ubuntu 15.10 and PhpStorm to test GAE.
Could someone please provide clear instructions about step 4? What do I need to do to install the php interpreter and the App Engine php extension?
P.s. I've already searched with google but I only found old/confusing tutorials
That GAE PHP extension seems like a quite new thing. Don't remember using it on the SDK in Ubuntu 14.04.
You need to build PHP and that extension from source. You should grab the latest PHP5.5 branch from their source repo (http://php.net/git.php) and build it. That linked page contains instructions on building PHP but the procedure is similar to the following:
$ git clone <php-src>
$ cd ./php-src/
$ git checkout PHP-5.5
$ ./buildconf
$ ./configure --prefix="/opt/php55"
$ sudo make && sudo make install
And remember to pick the modules and packages you want to compile with PHP5.5 to be used in the SDK. I think Google had an official list of modules and extensions they use inside GAE PHP and inside the SDK PHP. The prefix argument tells the compiler where to install the resulting application.
Then you need to get that source for the PHP extension and build it
$ git clone https://github.com/GoogleCloudPlatform/appengine-php-extension
$ cd appengine-php-extension
$ phpize # remember to use the phpize from the just built PHP5.5 binaries
$ ./configure
$ sudo make && sudo make install
(That Git repository contains detailed building instructions so you should probably refer to them when building.)
Enable the resulting .so for the PHP5.5 you just built using the PHP configuration files.
After that you need to install the PHP SDK and configure it to use the newly built PHP binary
$ dev_appserver.py <...> --php_executable_path=/opt/php55/bin/php-cgi
The SDK will let you know if the built PHP binaries are incompatible with the SDK version you use. I remember compiling the PHP from source around 5 times before it worked without any warnings.
But essentially they are telling you to compile PHP from source, then compile their extension from source and then use the built PHP+extension with the downloaded SDK. These instructions are from the top of my head so you may need to adjust the commands and procedures.
The process can be simplified by using Docker, here is an image you can use: https://hub.docker.com/r/mhariri/docker-google-appengine-php/
To run your app, you just need docker installed, and then run the following command in your app directory:
docker run -it -v $(pwd):/app --rm --net=host mhariri/docker-google-appengine-php

How do I customise a Google App Engine Managed VM with a Standard Runtime?

I would like to customise a (Python) Standard Runtime Managed VM.
In theory, this should be possible by adding some extra commands to the VM Dockerfile.
Google's documentation states that a VM Dockerfile is automatically generated when the App is first deployed;
If you are using a standard runtime, the SDK will create a Dockerfile for you the first time you run the gcloud preview app deploy commands. The file will exist in a predetermined location:
If you are developing in Java, the Dockerfile appears in the root of the compiled Web Application Archive directory (WAR)
If you are developing in Python or Go, the Dockerfile appears in the root of your application directory.
And that extra commands can indeed be added;
You can add more docker commands to this file, while continuing to run and deploy your app with the standard runtime declaration.
However in practice the Dockerfile is automatically deleted immediately after deployment competes, preventing any customisation.
Has anyone managed to add Dockerfile commands to a Managed VM with a Standard Runtime? Any help would be gratefully appreciated.
I tried the same thing and did not succeed. There is however an equivalent way of doing this that I fell back to.
You can create a custom runtime that mimics the standard runtime.
You can do this because Google provides the Docker base images for all the standard runtimes. Mimicking a standard runtime is therefore simply a matter of selecting the right base image in the Dockerfile of the custom runtime. For the standard Python App Engine VM the Dockerfile is:
FROM gcr.io/google_appengine/python-compat
ADD . /app
Now that you have recreated the standard runtime as a custom runtime, you can modify the Dockerfile to make any customizations you need.
Important Note
The development server does not support custom Dockerfiles (you will get an error about --custom-entrypoint), so you have to move your test environment to App Engine servers if you are doing this. I think this is true regardless of whether you are using a standard runtime and customizing the Dockerfile or using a custom runtime. See this answer.
A note about the development server not working with custom runtimes - dev_appserver.py doesn't deal with Docker or Dockerfiles, which is why it complains about needing you to specify --custom_entrypoint. However as a workaround you can manually set up the dependencies locally. Here's an example using 'appengine-vm-fortunespeak' which uses a custom runtime based on python-compat:
$ git clone https://github.com/GoogleCloudPlatform/appengine-vm-fortunespeak-python.git
$ cd appengine-vm-fortunespeak-python
# Local dependencies from Dockerfile must be installed manually
$ sudo pip install -r requirements.txt
$ sudo apt-get update && install -y fortunes libespeak-dev
# We also need gunicorn since its used by python-compat to serve the app
$ sudo apt-get install gunicorn
# This is straight from dev_appserver.py --help
$ dev_appserver.py app.yaml --custom_entrypoint="gunicorn -b localhost:{port} main:app"
Note that if you are using any of the non -compat images, you can run your app directly using Docker since they don't need to emulate the legacy App Engine API, for example using 'getting-started-python' which uses the python runtime:
$ git clone https://github.com/GoogleCloudPlatform/getting-started-python.git
$ cd 6-pubsub
# (Configure the app according to the tutorial ...)
$ docker build .
$ docker images # (note the IMAGE_ID)
$ docker run -p 127.0.0.1:8080:8080 -t IMAGE_ID
Try the above with any -compat images and you will have problems - for example on python-compat you'll see initialization errors in runtime/google/appengine/tools/vmboot.py. It needs to be run on a real Managed VM instance.

Resources