Is it possible to integrate AWS sagemaker and delta lake - amazon-sagemaker

Is it possible to integrate AWS sage maker and delta lake?
thanks
Ramabadran

Yes, though it depends on what part of SageMaker you mean (Training, Notebook, Inference, etc).
Last week, an integration between SageMaker and Delta Lake was documented here (custom docker in the SageMaker Processing API)
https://github.com/eitansela/sagemaker-delta-sharing-demo/tree/main/delta_lake_bring_your_own_container_processing

Related

Pattern discovery in aws sagemaker

How can i run "Pattern discovery" on my dataset using aws sagemaker?
And of there is a simmiler term to
"Pattern discovery" because i cant find a lot of info about it
The concept lies downstream of Data Wrangling.
As can be seen in the official guide 'Prepare ML Data with Amazon SageMaker Data Wrangler:
Amazon SageMaker Data Wrangler (Data Wrangler) is a feature of Amazon
SageMaker Studio that provides an end-to-end solution to import,
prepare, transform, featurize, and analyze data. You can integrate a
Data Wrangler data preparation flow into your machine learning (ML)
workflows to simplify and streamline data pre-processing and feature
engineering using little to no coding. You can also add your own
Python scripts and transformations to customize workflows.
Here you will find a few tools for pattern recognition/discovery (and general data analysis) that will be used in SageMaker Studio: "Analyze and Visualize".
The data scientist's advice I can give is not to use no-code tools if you don't fully understand the nature of the data you are dealing with. A good knowledge of the data is a prerequisite for any kind of targeted analysis. Try writing some custom code instead, to have maximum control over each operation.

Train Amazon SageMaker object detection model on local PC

I wonder if it's possible to run training Amazon SageMaker object detection model on a local PC?
You're probably referring to this object detection algorithm which is part of of Amazon SageMaker built-in algorithms. Built-in algorithms must be trained on the cloud.
If you're bringing your own Tensorflow or PyTorch model, you could use SageMaker training jobs to train either on the cloud or locally as #kirit noted.
I would also look at SageMaker JumpStart for a wide variety of object detection algorithm which are TF/PT based.
You can use SageMaker Local Mode to run SageMaker training jobs locally on your PC. Here is a list of examples. https://github.com/aws-samples/amazon-sagemaker-local-mode

Is there any decentralized cloud project compatible with Polkadot's Substrate?

When developing a DApp, one has to put part of the tech on centralized services such as AWS or Google Cloud.
I am wondering if there are any other projects that allow to decentralize database and computational power in order to have fully decentralized and unstoppable DApps?
You might want to check Tea Project since it offers;
decentralized cloud
rich and fully decentralized dapps
trusted computation environment
For instance, IPFS offers decentralized storage but lacks a decentralized compute layer to go with it. Projects like Helium decentralize data transmission but are missing a compute layer to directly run dApps on network data. Tea Project solves the current issues of decentralized cloud ecosystem. You can check the website and the infographics attached below.
Tea Project Infographics
Some decentralized solutions are already emerging in order to decentralize databases and computational power.
For instance, Aleph.im is a scalability cross-chain project that does exactly that: decentralized database and computation; compatible with Substrate/Polkadot, Ethereum, BinanceChain, and more.
Check out its SDK in Javascript and Python.

How can I deploy AWS SageMaker Linear Learner Model in a Local Environment

I have trained a AWS SageMaker Model using the in-built Linear Learner algorithm. I can download the trained model artifacts (model.tar.gz) from S3.
How can I deploy the model in an local environment which is independent of AWS, so I can make predictions inferences calls without internet access?
Matx, there is no local mode for built-in algorithms. However, you can programmatically load mxnet module with model weights and use it to make predictions. Check https://forums.aws.amazon.com/thread.jspa?messageID=827236&#827236 for code example.

What is the equivalent of AWS elastic beanstalk in GCP?

Is Elastic Beanstalk of AWS an answer to GAE of GCP?
Are there any benchmark comparisons between the two?
Is Elastic beanstalk of AWS an answer to GAE of GCP?
Yes, in a nutshell, you can think of it like that.
There's an article by Google showing side-by-side comparisons of AWS and Google Cloud products, listing them as IaaS and PaaS:
Service AWS Google Cloud Platform
-----------------------------------------------------------------
IaaS Amazon Elastic Compute Cloud Compute Engine
PaaS AWS Elastic Beanstalk App Engine
However, it is worth noting that at least AWS Elastic Beanstalk is not strictly a PaaS solution, I would describe it more as a management layer on top of EC2. From their FAQ:
Q: How is AWS Elastic Beanstalk different from existing application containers or platform-as-a-service solutions?
Most existing application containers or platform-as-a-service solutions, while reducing the amount of programming required, significantly diminish developers’ flexibility and control. Developers are forced to live with all the decisions predetermined by the vendor–with little to no opportunity to take back control over various parts of their application’s infrastructure. However, with AWS Elastic Beanstalk, developers retain full control over the AWS resources powering their application. If developers decide they want to manage some (or all) of the elements of their infrastructure, they can do so seamlessly by using Elastic Beanstalk’s management capabilities.
I don't think it's possible to compare these platforms with benchmarks. There is still a lot you can configure on this platforms and performance will mostly depend on your configuration and (even more) your application code.
It comes down to differences in pricing, easy-of-use and the availability of other services on these platforms.

Resources