Sagemaker doesn't inference in an async manner - amazon-sagemaker

I've deployed a custom model with an async endpoint. I want to process video files with it because videos can have ~5-10 minutes I can't load all frames to memory. Of course, I want to make an inference on each frame.
I've written
input_fn - download video file from s3 using boto and creates generator which loads video frames with a given batch size - return a generator - written with OpenCV
predict_fn - iterate over generator batched frames and generate prediction using model - save prediction in list
output_fn - transform prediction into json format, gzip all to reduce the size
Endpoint works well, but the problem is concurrency. The sagemaker endpoint processes request after request (from cloudwatch and s3 save file time). I don't know why this happens.
max_concurrent_invocations_per_instance is set to 1000. Other settings from PyTorch serving are as follows:
SAGEMAKER_MODEL_SERVER_TIMEOUT: 100000
SAGEMAKER_TS_MAX_BATCH_DELAY: 10000
SAGEMAKER_TS_BATCH_SIZE: 1000
SAGEMAKER_TS_MAX_WORKERS: 4
SAGEMAKER_TS_RESPONSE_TIMEOUT: 100000
And still, it doesn't work. So how can I create an async inference endpoint with PyTorch to get concurrency?

The concurrency settings for TorchServe DLC are controlled by such mechanisms as # of workers, which can be set by defining the appropriate variables, such as SAGEMAKER_TS_*, and SAGEMAKER_MODEL_* (see, e.g., this page for details on their meaning and implications).
While the latter are agnostic to any particular serving stack and are defined in the SageMaker Inference Toolkit, the former are TorchServe-specific and are defined in TorchServe Inference Toolkit. Moreover, since the TorchServe Inference Toolkit is built on top of the SageMaker Inference Toolkit, there is a non-trivial interplay between these two sets of params.
Thus you may also want to experiment with such params as, e.g., SAGEMAKER_MODEL_SERVER_WORKERS to properly set up the concurrency setting of the SageMaker Async Endpoint.

Related

Sagemaker Inference Endpoint with fitted Encoder

So as I don't get any help by reading documentations and blogposts I ll ask over here:
I want to deploy a Sagemaker Endpoint with fitting a Sagemaker Pipeline. I want to have an endpoint which is backed by a PipelineModel. This PipelineModel should consist of two models: A fitted model which encodes my data and a model which predicts with an XGBoost estimator. I follow along this docu: enter link description here
But this example doesn't show how to integrate the fitted preprocessor model in a PipelineStep. What Step do I have to use? A TrainingStep? Thanks in advance. I am desperate
Check out this official example: Train register and deploy a pipeline model.
The two variations to keep in mind:
For models that need training (usually for those based on tensorflow/pytorch), a TrainingStep must be used so that the output (the model artifact) is correctly (and automatically) generated with the ability to use it later for inference.
For models generated by a simple fitting on the data (e.g., a scaler with sklearn), you can think about creating a TrainingStep in disguised (it is an extra component in pipeline, it is not very correct to do it but it is a working round) but the more correct method is to configure the preprocessing script so that it internally saves a model.tar.gz file with the necessary files (e.g., pickle or joblib objects) inside it can then be properly used in later steps as model_data. In fact, if you have a model.tar.gz, you can define a Model of various types (e.g., an SKLearnModel) that is already fitted.
At this point, you define your PipelineModel with the trained/fitted models and can either proceed to direct endpoint deployment or decide to go through the model registry and keep a more robust approach.

REACT return WAV audio blob to backend, Streamlit custom component

In the past, I have built a custom component (REACT based) for the Streamlit-Framework which lets the user record audio inside a web browser of choice. Please have a look at the current version of streamlit-audio-recorder here.
As a mediocre web developer, I did not succeed in converting audio data stored in the browser's cache (audio-blob object) so that I can return it to the Streamlit backend.
What I have tried so far & my thought process:
There exist various scripts that enable saving audio to a local disk. However, none of these solutions work in an online-deployed scenario. (the program would save to the server's disk instead of the user's). This is why I came to the conclusion that this issue requires a solution that uses the audio data which is stored in the user's browser cache after being recorded.
The data stored in this cache via the audio-blob format, can not directly be passed back to python as a return variable and needs to be converted to an "environment agnostic datatype" (I tried binary base64). This conversion's complexity scales exponentially with the length of the audio data. Therefore I considered splitting the audio-blob into slices which can then be converted, aggregated and returned to Python. However, this process of splitting and concatenating WAV-audio blobs was not possible for me to implement due to the data structure/metadata inside the wav-file and the lack of libraries that would enable audio-blob slicing etc.
Does somebody know of a more elegant and performant solution? This would enable to finalize the audio recorder component and provide immense value to the Streamlit community which currently lacks comparable functionality.

Firebase cloud functions compress video

I have successfully implemented video sharing in my app using react native and firebase, but I want to ensure videos being stored are no more than 1080x1080 (maybe 720 depending how it looks).
Videos are max 8 seconds long, I am trying my best to keep them under 5MB each if possible. I was able to do some compressing on the client side (crop to square/trim), but I am hoping to be able to compress the videos even more without losing that quality via cloud functions (storage trigger).
After doing some looking around, it looks like Moviepy is a good option, but it use's python and I am not sure how I can use this script inside of a cloud function storage trigger.
Here is what that looks like:
//Not sure how this will import
import moviepy.editor as mp
//Can I get the video here from the bucket path in a cloud function?
clip = mp.VideoFileClip("video-stored.mp4")
clip_resized = clip.resize(height=1080) # make the height 1080px ( According to moviePy documenation The width is then computed so that the width/height ratio is conserved.)
//resize video, then we need to store it in the same location (same file path)
clip_resized.write_videofile("video-stored-resized.mp4")
I would love to hear some suggestions regarding video compressing via a cloud function and thoughts on using the above script/module with cloud functions.
At this point, firebase functions does not support languages other than node.js.
Thus, there are 2 solutions.
if you would like to keep using moviepy.
Writing a part to call moviepy-related apis when firebase storage is triggered in node.js and having the python api in any preferred environments. (I guess you should use pubsub provided by gcp to call the python apis)
writing all parts in node.js
There is a great module called fluent-ffmepg in node.js too but I know adding a watermark with the module is not as easy as moviepy is...
By the way, when I tried to combine vids in firebase functions, I was not able to make it maybe because of the limitation of the environment.
So, I personally recommend the very first solution: ofc it depends on your situation such as how big your video files are.

Building Apache Camel Routes Dynamically

I am working on an application that uses Apache Camel to flow a single request message (input) through some initial Camel components/logic and then to a multicast at which point the route branches out into multiple branches. The purpose of each branch is to retrieve data from a specific web service (or other back-end data source, e.g. database) and then after the web service invocation / data retrieval operation completes, each of the branches dumps its output data in the same way via a custom bean endpoint. I expect to eventually have approximately 40 different branches in the Camel route, each of which might flow through a different set of Camel components in order to prepare its request, submit the request, process the response, etc... I anticipate that a fair number of the branches will be quite similar (e.g. all SOAP invocations quite similar, all REST invocations quite similar, etc.) and so have concocted an approach whereby a config file stores the list of back-end data sources to be invoked/retrieved-from along with the ability to define (indirectly) the route that should be taken to reach each of those sources. The config file looks something like this:
[a]
route=X + Y
Y.url=http://someservice
[b]
route=Z
Z.someproperty=123
And then I have code that reads through that config file and treats each of the "sections" (e.g. "[a]", "[b]", etc.) as a branch (i.e. a destination out of the multicast) and relies on classes that are dynamically instantiated (e.g. XRouteSegment, YRouteSegment, ZRouteSegment) in order to each in-turn populate/define the route for its specific branch. As some examples, I have built RouteSegment helper classes for wiring up components such as Velocity, CXF, CXF-RS, for data marshalling/unmarshalling, etc... based on properties set in config file.
As far as initialization of the Camel context goes, it starts out in a fairly typical way with a single RouteBuilder which builds out the first part of the route up to the multicast. But then I go into a for loop and loop through all of the sources found in the config file (e.g. "a", "b", etc.) and create seda nodes for each of those which the multicast flows to. And then I call into each of the RouteSegment instances associated with a given source (e.g. X + Y) and allow those to add to the RouteDefinition as they need (e.g. from their seda start point going forward). And then back in my "main" RouteBuilder I tack on some final routing/components that is to be the same for all of the branches (i.e. the logic that forces each of the branches to store its data via the same custom bean).
The code works just fine, but I am questioning whether this approach is overkill and/or whether there is some easier/cleaner way of doing this that I am overlooking. Would I be better off just having individual Java classes (i.e. RouteBuilders) for each of the branches (in addition to the "trunk" and "tails" of the route)? What I was trying to avoid was having too much duplicated logic/code across all of those classes ... e.g. 20 classes all pulling data from SOAP web services in pretty much exactly the same way. So I am using a RouteSegment instance like "X" referenced above as re-usable shorthand for what would otherwise be a sequence of different Camel Java DSL calls (e.g. from/to/process/log/etc ... with parameters to control the specifics of the individual statements). Are there any other strategies/approaches I should consider in order to dynamically build out a Camel route (+ sizable number of branches) at runtime (e.g. within a for loop, or via some sort of reflection/discovery process (app runs using Spring Boot))?
Thanks in advance for any ideas you might be able to provide/suggest that I might not have thought of / tried yet!
I just want to throw in some subjects I am missing in your description.
If I understand your description correct, all your branches components that are called by multicast are not real components but kind of building blocks to build Camel routes at runtime. That sounds like they are not testable and not startable standalone (but perhaps you just not explained that aspect).
If you would build individual small components (every one with its own RouteBuilder) you would have something similar to a microservice architecture: small units to develop and deploy individually.
Since you use Spring Boot, you have autodiscovery of Routes, so they are kind of "pluggable". The components are also testable using Camel routetests etc.
The components would be much more "static" and small standalone projects. This also ensures a fast development roundtrip when you work on the components.
But as you write, this can lead to lots of redundant code. So I guess you have to decide what is more important for you.

Google App Engine Large File Upload

I am trying to upload data to Google App Engine (using GWT). I am using the FileUploader widget and the servlet uses an InputStream to read the data and insert directly to the datastore. Running it locally, I can upload large files successfully, but when I deploy it to GAE, I am limited by the 30 second request time. Is there any way around this? Or is there any way that I can split the file into smaller chunks and send the smaller chunks?
By using the BlobStore you have a 1 GB size limit and a special handler, called unsurprisingly BlobstoreUpload Handler that shouldn't give you timeout problems on upload.
Also check out http://demofileuploadgae.appspot.com/ (sourcecode, source answer) which does exactly what you are asking.
Also, check out the rest of GWT-Examples.
Currently, GAE imposes a limit of 10 MB on file upload (and response size) as well as 1 MB limits on many other things; so even if you had a network connection fast enough to pump up more than 10 MB within a 30 secs window, that would be to no avail. Google has said (I heard Guido van Rossum mention that yesterday here at Pycon Italia Tre) that it has plans to overcome these limitations in the future (at least for users of GAE which pay per-use to exceed quotas -- not sure whether the plans extend to users of GAE who are not paying, and generally need to accept smaller quotas to get their free use of GAE).
you would need to do the upload to another server - i believe that the 30 second timeout cannot be worked around. If there is a way, please correct me! I'd love to know how!
If your request is running out of request time, there is little you can do. Maybe your files are too big and you will need to chunk them on the client (with something like Flash or Java or an upload framework like pupload).
Once you get the file to the application there is another issue - the datastore limitations. Here you have two options:
you can use the BlobStore service which has quite nice API for handling up 50megabytes large uploads
you can use something like bigblobae which can store virtually unlimited size blobs in the regular appengine datastore.
The 30 second response time limit only applies to code execution. So the uploading of the actual file as part of the request body is excluded from that. The timer will only start once the request is fully sent to the server by the client, and your code starts handling the submitted request. Hence it doesn't matter how slow your client's connection is.
Uploading file on Google App Engine using Datastore and 30 sec response time limitation
The closest you could get would be to split it into chunks as you store it in GAE and then when you download it, piece it together by issuing separate AJAX requests.
I would agree with chunking data to smaller Blobs and have two tables, one contains th metadata (filename, size, num of downloads, ...etc) and other contains chunks, these chunks are associated with the metadata table by a foreign key, I think it is doable...
Or when you upload all the chunks you can simply put them together in one blob having one table.
But the problem is, you will need a thick client to serve chunking-data, like a Java Applet, which needs to be signed and trusted by your clients so it can access the local file-system

Resources