An issue with inference speed while using SageMaker Neo - amazon-sagemaker

I am a student studying Sage Maker Neo.
I am working on this tutorial.
Training and Serving with TensorFlow on Amazon SageMaker
https://github.com/aws/amazon-sagemaker-examples/blob/master/aws_sagemaker_studio/frameworks/tensorflow_mnist/tensorflow_mnist.ipynb
What I'm curious about is that the inference speed is similar when using the c5 instance and when using the p2 instance.
Please let me know what I am missing.

The tutorial doesn't have anything to do with Neo. You can try out Neo using this example - https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker_neo_compilation_jobs/pytorch_torchvision/pytorch_torchvision_neo.ipynb.
Also, SageMaker Inference Recommender is a good tool to standardize the testing of model performance - https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender.html

Related

PyTorch Lightning with Amazon SageMaker

We’re currently running using Pytorch Lightning for training outside of SageMaker. Looking to use SageMaker to leverage distributed training, checkpointing, model training optimization(training compiler) etc to accelerate training process and save costs. Whats the recommended way to migrate their PyTorch Lightning scripts to run on SageMaker?
The easiest way to run Pytorch Lightning on SageMaker is to use the SageMaker PyTorch estimator (example) to get started. Ideally you will have add a requirement.txt for installing pytorch lightning along with your source code.
Regarding distributed training Amazon SageMaker recently launched native support for running Pytorch lightning based distributed training. Please follow the below link to setup your training code
https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-modify-sdp-pt-lightning.html
https://aws.amazon.com/blogs/machine-learning/run-pytorch-lightning-and-native-pytorch-ddp-on-amazon-sagemaker-training-featuring-amazon-search/
There's no big difference in running PyTorch Lightning and plain PyTorch scripts with SageMaker.
One caveat, however, when running distributed training jobs with DDPPlugin, is to set properly the NODE_RANK environment variable at the beginning of the script, because PyTorch Lightning knows nothing about SageMaker environment variables and relies on generic cluster variables:
os.environ["NODE_RANK"] = str(int(os.environ.get("CURRENT_HOST", "algo-1")[5:]) - 1)
or (more robust):
rc = json.loads(os.environ.get("SM_RESOURCE_CONFIG", "{}"))
os.environ["NODE_RANK"] = str(rc["hosts"].index(rc["current_host"]))
Since your question is specific to migration of already working code into Sagemaker, using the link here as reference, I can try to break the process into 3 parts :
Create a Pytorch Estimator - estimator
import sagemaker
sagemaker_session = sagemaker.Session()
pytorch_estimator = PyTorch(
entry_point='my_model.py',
instance_type='ml.g4dn.16xlarge',
instance_count=1,
framework_version='1.7',
py_version='py3',
output_path: << s3 bucket >>,
source_dir = << path for my_model.py >> ,
sagemaker_session=sagemaker_session)
entry_point = "my_model.py" - this part should be your existing Pytorch Lightning script. In the main method you can have something like this:
if __name__ == '__main__':
import pytorch_lightning as pl
trainer = pl.Trainer(
devices=-1, ## in order to utilize all GPUs
accelerator="gpu",
strategy="ddp",
enable_checkpointing=True,
default_root_dir="/opt/ml/checkpoints",
)
model=estimator.fit()
Also , the link here explains the coding process very well .
https://vision.unipv.it/events/Bianchi_giu2021-Introduction-PyTorch-Lightning.pdf

SageMaker Inference for a video input

I wonder if it's possible to run SageMaker Inference or Batch Transform job directly for a video input (.mp4 or another format)?
If no could you please advice the best practice that might be used for pre-processing?
Asynchronous inference could be a good option for this use case. There is a blog published by AWS that talks about how you can do this.
https://aws.amazon.com/blogs/machine-learning/run-computer-vision-inference-on-large-videos-with-amazon-sagemaker-asynchronous-endpoints/

deploy h2o.ai trained learner in snowflake

I am reading article titles that suggest h2o.ai integrates its ML in/with snowflake.
https://www.h2o.ai/resources/solution-brief/integration-of-h2o-driverless-ai-with-snowflake/
If I wanted to export a POJO learner like a gbm and have it run in snowflake, is there a clean way to do that? I didn't see any clear directions in the (several) articles I found.
How does that integrate with ML-ops?
One way to integrate models built in H2o.ai is to integrate through Snowflake External Functions.
This is documented at https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/snowflake-integration.html
H2o.ai also have (or will have shortly) support to deploy models into Snowflake Java UDFs and it is described in https://www.h2o.ai/blog/h2o-integrates-with-snowflake-snowpark-java-udfs-how-to-better-leverage-the-snowflake-data-marketplace-and-deploy-in-database/

Continuous Training in Sagemaker

I am trying out Amazon Sagemaker, I haven't figured out how we can have Continuous training.
For example if i have a CSV file in s3 and I want to train each time the CSV file is updated.
I know we can go again to the notebook and re-run the whole notebook to make this happen.
But i am looking for an automated way, with some python scripts or using a lambda function with s3 events etc
You can use boto3 sdk for python to start training on lambda then you need to trigger the lambda when csv is update.
http://boto3.readthedocs.io/en/latest/reference/services/sagemaker.html
Example python code
https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model-create-training-job.html
Addition: You dont need to use lambda you just start/cronjob the python script any kind of instance which has python and aws sdk in it.
There are a couple examples for how to accomplish this in the aws-samples GitHub.
The serverless-sagemaker-orchestration example sounds most similar to the use case you are describing. This example walks you through how to continuously train a SageMaker linear regression model for housing price predictions on new CSV data that is added daily to a S3 bucket using the built-in LinearLearner algorithm, orchestrated with Amazon CloudWatch Events, AWS Step Functions, and AWS Lambda.
There is also the similar aws-sagemaker-build example but it might be more difficult to follow currently if you are looking for detailed instructions.
Hope this helps!

Is it alright to write a software creates a roster for a restaurant with JAVA language?

I need some advice on the project im working for fun during the summer. Say im writing a software that helps creating the roster base on the availiabilities of the staffs. I need some advice on how to implement these items below:
A database that holds the infomation of the staffs (ie.
availiabilities, minimum hours, maximum hours).
The core of the software where the process of arranging staffs on a day base on the database.
A GUI that displays the final version of roster after step 2 above so it can be printed out by the manager.
Im thinking of using Java but not sure how i to implement and connect the database, the core and the GUI together? Can i do everything listed using Java?Can anyone please suggest me a solution or an article for this?
There are many different paths you could follow, depending on what your end goals are.
DO you want to learn Java or are you just trying to practice application development ? I am assuming you are interested in an application that runs on the web.
For me, the best choices are either Java or PHP, but this is largely based on my own experiences. Others might argue that Python or Ruby would be a better to start.
For a Java based solution, you would use: Java JSPs with HTML for the front end (Javascript and JQuery optional); Java (Servlets) for the middle tier; JDBC, JPA, Spring, and/or Hybernate to connect to the DB; MySQL is a good candidate for the DB, but there are other options.
For a PHP based solution, you would use: PHP, HTML for the frond end (Javascript and JQuery optional); PHP for the middle tier (there are frameworks you coulse here as well); PHP to connect to the DB; MySQL is a good candidate for the DB, but there are other options.
If you dont know either, I think PHP is easier to setup and run for beginners, and is the basis for many open source and commercial web applications (e.g., WordPress); but Java is used for most large scale applications.

Resources