Continuous Training in Sagemaker - amazon-sagemaker

I am trying out Amazon Sagemaker, I haven't figured out how we can have Continuous training.
For example if i have a CSV file in s3 and I want to train each time the CSV file is updated.
I know we can go again to the notebook and re-run the whole notebook to make this happen.
But i am looking for an automated way, with some python scripts or using a lambda function with s3 events etc

You can use boto3 sdk for python to start training on lambda then you need to trigger the lambda when csv is update.
http://boto3.readthedocs.io/en/latest/reference/services/sagemaker.html
Example python code
https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model-create-training-job.html
Addition: You dont need to use lambda you just start/cronjob the python script any kind of instance which has python and aws sdk in it.

There are a couple examples for how to accomplish this in the aws-samples GitHub.
The serverless-sagemaker-orchestration example sounds most similar to the use case you are describing. This example walks you through how to continuously train a SageMaker linear regression model for housing price predictions on new CSV data that is added daily to a S3 bucket using the built-in LinearLearner algorithm, orchestrated with Amazon CloudWatch Events, AWS Step Functions, and AWS Lambda.
There is also the similar aws-sagemaker-build example but it might be more difficult to follow currently if you are looking for detailed instructions.
Hope this helps!

Related

Amazon SageMaker Model Monitor for Batch Transform jobs

Couldn't find the right place to ask this, so doing it here.
Does Model Monitor support monitoring Batch Transform jobs, or only endpoints? The documentation seems to only reference endpoints...
We just launched the support.
Here are the sample notebook:
https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker_model_monitor/model_monitor_batch_transform
Here is the what's new post:
https://aws.amazon.com/about-aws/whats-new/2022/10/amazon-sagemaker-model-monitor-batch-transform-jobs/

Migrating from Google App Engine Mapreduce to Apache Beam

I have been a long-time user of Google App Engine's Mapreduce library for processing data in the Google Datastore. Google no longer supports it and it doesn't work at all in Python 3. I'm trying to migrate our older Mapreduce jobs to Google's Dataflow / Apache Beam runner, but the official documentation is awful, it just describes Apache Beam, it does not tell you how to migrate.
In particular, the issues are this:
in Mapreduce, the jobs would run on your existing deployed application. However in Beam you have to create and deploy a custom Docker image to build the environment for Dataflow, is this right?
To create a new job template in Mapreduce, you just need to edit a yaml file and deploy it. To create one in Apache beam, you need to create custom runner code, a template file deployed to google cloud storage, and link up with the docker image, is this right?
Is the above accurate? If so, is it generally the case that working with Dataflow is much more difficult than Mapreduce? Are there any libraries or tips for making this easier?
In technical terms that's what is happening, but unless you have some specific advanced use-cases, you won't need to set any custom Docker images manually. Dataflow does some work in the background to run your user code and dependencies on a custom container so that it can execute your user-written code and dependencies on their VMs.
In Dataflow, writing a job template mainly requires writing some pipeline code in your chosen language (Java or Python), and possibly writing some metadata. Once your code is written, creating and staging the template itself isn't much different than running a normal Dataflow job. There's a page documenting the process.
I agree the page on Mapreduce to Beam migration is very sparse and unhelpful, although I think I understand why that is. Migrating from Mapreduce to Beam isn't a straightforward 1:1 migration where only the syntax changes. It's a different pipeline model and most likely will require some level of rewriting your code for the migration. A migration guide that fully covered everything would end up repeating most of the existing documentation.
Since it sounds like most of your questions are around setting up and executing Beam pipelines, I encourage you to begin with the Dataflow quickstart in your chosen language. It won't teach you how to write pipelines, but will teach you how to set up your environment to write and run pipelines. There are links in the quickstarts which direct you to Apache Beam tutorials that teach you the Beam API and how to write your own pipelines, and those will be useful for rewriting your Mapreduce code in Beam.

How to run python file repeatedly every day in GCP(google cloud platform)?

I have written python code to move data from firestore to Bigquery.
How can I run this code at a specified time every day?
Please help beginner
The most economical way would be to use Google Cloud Scheduler. It can initiate a job that runs on a schedule, similar to corn. Then via Pub/Sub it can invoke Google Cloud Function with your code.
Here is the tutorial, describing exactly that: https://cloud.google.com/scheduler/docs/tut-pub-sub Just use Python, instead of JavaScript for runtime.

Saving Traned AI models In Google Colab

After training a twin delayed DDPG agent in Google colab for 10 hours I downloaded the python file to continue the work on another platform. The problem however is that the training data is not included when I save the python notebook file, hence the training data was lost. How can I save the file, move it to for example to the Unity 3D environment without dropping the training so I don't have to re-train the agent.
I sincerely appreciate any answers, comments, thoughts etc!
Store files you want to be persistent across sessions in Drive.
Here's a snippet showing how to mount your Google Drive as a FUSE filesystem in Colab:
https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveA

Does AWS have an image processing service?

At past I have used ImageMagick in order to code some web app that performs some image processing.
Then I encountered this from Google App engine -
https://cloud.google.com/appengine/docs/java/images/
Which looks quite interesting, however I would like to work with AWS related technologies, and I wonder if AWS have a similar service?
Thanks,
Yair
No they don't have an image processing service. They do have transcoding services for audio and video, though:
http://docs.aws.amazon.com/elastictranscoder/latest/developerguide/introduction.html
But that's not what you are looking for. The closest you can get to that with AWS is probably by creating your own on-demand instance which can process a batch of images and then stops again. Or maybe have a look into Data Pipelines or Simple Workflows. Might be overkill for what you want to achieve, though, depending on the scale.

Resources