Sagemaker export and load model to memory

Sagemaker export and load model to memory - amazon-sagemaker

I have created a model using sagemaker (on aws ml notebook).
I then exported that model to s3 and a .tar.gz file was created there.
Im trying to find a way to load the model object to memory in my code (without using AWS docker images and deployment) and run a prediction on it.
I looked for functions to do that in the model section of the sagemaker docs, but everything there is tightly coupled to the AWS docker images.
I then tried opening the file with tarfile and shutil packages but that was useless.
Any ideas?

With the exception of XGBoost, built-in algorithms are implemented with Apache MXNet, so simply extract the model from the .tar.gz file and load it with MXNet: load_checkpoint() is the API to use.
XGBoost models are just pickled objects. Unpickle and load in sklearn:
$ python3
>>> import sklearn, pickle
>>> model = pickle.load(open("xgboost-model", "rb"))
>>> type(model)
<class 'xgboost.core.Booster'>
Models trained with built-in library (Tensorflow, MXNet, Pytorch, etc.) are vanilla models that can be loaded as-is with the correct library.
Hope this helps.

Related

Where to find definitions of pre-build images of SageMaker?

I am trying to build on top of the SageMaker pre-built image sagemaker-base-python-310 (listed here). For example, start from pre-built image and add additional requirements to it.
Who could point where to find the definition of the container underlying such image (e.g. DockerFile)?
What I tried
Searched on AWS GH Repos
Searched in ECS

SageMaker Studio images aren't available publicly. The DLC based images (such as Pytorch, TF, etc.) are essentially built on top of the frameworks (see https://github.com/aws/deep-learning-containers), but Base Python, Data Science etc. are not open source.
If you're looking to add packages and such as part of customization, I'd recommend using LCC scripts.

How to make inference on local PC with the model trained on AWS SageMaker by using the built-in algorithm Semantic Segmentation?

Similar to the issue of The trained model can be deployed on the other platform without dependency of sagemaker or aws service?.
I have trained a model on AWS SageMaker by using the built-in algorithm Semantic Segmentation. This trained model named as model.tar.gz is stored on S3. So I want to download this file from S3 and then use it to make inference on my local PC without using AWS SageMaker anymore. Since the built-in algorithm Semantic Segmentation is built using the MXNet Gluon framework and the Gluon CV toolkit, so I try to refer the documentation of mxnet and gluon-cv to make inference on local PC.
It's easy to download this file from S3, and then I unzip this file to get three files:
hyperparams.json: includes the parameters for network architecture, data inputs, and training. Refer to Semantic Segmentation Hyperparameters.
model_algo-1
model_best.params
Both model_algo-1 and model_best.params are the trained models, and I think it's the output from net.save_parameters (Refer to Train the neural network). I can also load them with the function mxnet.ndarray.load.
Refer to Predict with a pre-trained model. I found there are two necessary things:
Reconstruct the network for making inference.
Load the trained parameters.
As for reconstructing the network for making inference, since I have used PSPNet from training, so I can use the class gluoncv.model_zoo.PSPNet to reconstruct the network. And I know how to use some services of AWS SageMaker, for example batch transform jobs, to make inference. I want to reproduce it on my local PC. If I use the class gluoncv.model_zoo.PSPNet to reconstruct the network, I can't make sure whether the parameters for this network are same those used on AWS SageMaker while making inference. Because I can't see the image 501404015308.dkr.ecr.ap-northeast-1.amazonaws.com/semantic-segmentation:latest in detail.
As for loading the trained parameters, I can use the load_parameters. But as for model_algo-1 and model_best.params, I don't know which one I should use.

The following code works well for me.
import mxnet as mx
from mxnet import image
from gluoncv.data.transforms.presets.segmentation import test_transform
import gluoncv
# use cpu
ctx = mx.cpu(0)
# load test image
img = image.imread('./img/IMG_4015.jpg')
img = test_transform(img, ctx)
img = img.astype('float32')
# reconstruct the PSP network model
model = gluoncv.model_zoo.PSPNet(2)
# load the trained model
model.load_parameters('./model/model_algo-1')
# make inference
output = model.predict(img)
predict = mx.nd.squeeze(mx.nd.argmax(output, 1)).asnumpy()

Tensorflow Keras API on Google cloud

I have a question on using tensorflow on google cloud platform.
I heard that Google cloud tensorflow doesnt support Keras (keras.io). However, now i can see that Tensorflow has its own API to access Keras (https://www.tensorflow.org/api_docs/python/tf/contrib/keras).
Given this, can I use the above mentioned API inside google cloud, since it is coming out along with Tensorflow package? Any idea sir?
I am able to access this API from the tensorflow installed on a anaconda machine.

Option 1# Please try package-path option.
As per the docs...
-package-path=PACKAGE_PATH
"Path to a Python package to build. This should point to a directory containing the Python source for the job"
Try and give a relative path to keras from your main script.
More details here:
https://cloud.google.com/sdk/gcloud/reference/beta/ml-engine/local/train
Option 2# If you have a setup.py file
Inside your setup.py file within setup call pass argument install_requires=['keras']

Google Cloud Machine Learning Engine does support Keras (keras.io), but you have to list it as a dependency when starting a training job. For some specific instructions, see this SO post, or a longer exposition on this blog page. If you'd like to serve your model on Google Cloud Machine Learning or using TensorFlow Serving, then see this SO post about exporting your model.
That said, you can also use tf.contrib.keras, as long as you use the --runtime-version=1.2 flag. Just keep in mind that packages in contrib are experimental and may introduce breaking API changes between versions.

Have a look at this example on git which I saw was recenly added:
Keras Cloud ML Example

How the GitHub store your repository files?

I'm feeling stupid, but I want to know how GitHub and Dropbox store user files, because I have a similar problem and I need to store user's project files .
Is it just like storing project files somewhere in the server and refer to the location as a field in the database, or there are other better methods ?
Thanks.

GitHub uses Git to store repositories, and accesses those repos from their Ruby application. They used to do this with Grit, a Ruby library. Grit was written to implement Git in Ruby but has been replaced with rugged. There are Git reimplementations in other languages like JGit for Java and Dulwich for Python. This presentation gives some details about how GitHub has changed over the years and is worth watching/browsing the slides.
If you wanted to store Git repositories, what you'd want to do is store them on a filesystem (or a cluster thereof) and then have a pointer in your database to point to where the filesystem is located, then use a library like Rugged or JGit or Dulwich to read stuff from the Git repository.
Dropbox stores files on Amazon's S3 service and then implements some wrappers around that for security and so on. This paper describes the protocol that Dropbox uses.
The actual question you've asked is how do you store user files. The simple answer is... on the filesystem. There are plugins for a lot of popular web frameworks for doing user file uploads and file management. Django has Django-Filer for instance. The difficulty you'll encounter in rolling your own file upload management system is building a sensible way to do permissions (so users can only download the files they are entitled to download), so it is worth looking into how the various framework plugins do it.

lightwave 3d 9.6 sdk trouble

Are there tutorials for the SDK (or at least an example), about how to create an export plugin (extract polygons from scene)?

In Lightwave, you're not forced to write an export plugin to extract the polygons from a scene/object : the LWO and LWS are documented (enough) to parse them quite easily.
The file formats documentation are in filefmts folder of the SDK. You can find libs that parse Lightwave files also, such as Open Asset Import library.
If you still need to do it as a plugin, there's a sample plugin for Modeler, in the sample/Modeler/Input-Output/vidscape folder.

You can find a reasonable amount of LW export plugins on the web. Some of them are compiled .p plugins, but some are Python .py or LScript *.ls. Two latter can be edited and tweaked with text editor of your choice to individual needs quite easily.
There are wikis on the web about Lightwave API commands available from script as well.