Deploy an externally trained tensorflow model artefact - amazon-sagemaker

I would like to host a tensorflow model (trained on my local PC) on Sagemaker and I followed this article: https://aws.amazon.com/fr/blogs/machine-learning/bring-your-own-pre-trained-mxnet-or-tensorflow-models-into-amazon-sagemaker/
However I'm not able to perform the inference. My model is an object detection model, locally it has generated frozen_inference_graph.pb, model.ckpt.data-00000-of-00001, model.ckpt.index and model.ckpt.meta, all this files have been converted into save_model.pb and variables.data-00000-of-00001, variable.index, then into .tar.gz by respecting the folder structure export/Servo/version/...
.tar.gz has been manually uploaded to s3 and I have successfully created an endpoint.
But when I try to perform inference, I have an error:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (415) from model with message "{"error": "Unsupported Media Type: Unknown"}"
My input datas are images. I need help.

Related

How do you resolve an "Access Denied" error when invoking `image_uris.retrieve()` in AWS Sagemaker JumpStart?

I am working in a SageMaker environment that is locked down. For example, my user account is prevented from creating S3 buckets. But, I can successfully run vanilla ML training jobs by passing in role=get_execution_role to an instance of the Estimator class when using an out-of-the-box algorithm such as XGBoost.
Now, I'm trying to use an algorithm (LightBGM) that is only available via the JumpStart feature in SageMaker, but I can't get it to work. When I try to retrieve an image URI via image_uris.retrieve(), it returns the following error:
ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied.
This makes some sense to me if my user permissions are being used when creating an object. But what I want to do is specify another role - like the one returned from get_execution_role - to perform these tasks.
Is that possible? Is there another work-around available? How can I see which role is being used?
Thanks,
When I encountered this issue, it was a permissions issue with a bucket that had changed.
In the SageMaker Python SDK source code , there is a cache that is located at in an AWS-owned bucket: jumpstart-cache-prod-{region}. and a manifest.json that translates the ECR path for the image for you.
If you look at the stack trace, it could be erroring out at the code that is looking for the manifest.
One place to look is if there are new restrictions placed in IAM, Included here is the minimum policy you need to access JumpStart (pretrained) models

How do I register a multi-container model in Model Registry in AWS SageMaker?

I have created a multi-container model in SageMaker notebook and deployed it through an endpoint. But while attempting to do the same through a SageMaker Studio Project (build, train and deploy model template), I need to register the multi-container model through a 'sagemaker.workflow.step_collections.RegisterModel' step, which I am unable to do.
The boto3 client method,
create_model()
has a parameter 'InferenceExecutionConfig' where 'mode' is set to direct for a multi-container model.
InferenceExecutionConfig={
'Mode': 'Serial'|'Direct'
}
To my understanding, multi-container model is created through boto3 api call. I haven't found a way to create a 'sagemaker.model.Model' instance through SageMaker Python SDK which has the above parameter hence not being able to register it.
Using 'sagemaker.pipeline.PipelineModel' will result in a serial pipeline whereas I want to invoke each model directly though an endpoint.
Even the boto3 api method
create_model_package()
which creates and registers the model to a ModelPackageGroup does not have the 'InferenceExecutionConfig' parameter which I can use to register the multi-container model.
My aim is to create a SageMaker Studio Project with a multi-container endpoint which can be used to fetch outputs from two models in the project. If I'm missing something or there is some other approach to achieve this, please let me know that too.

Loading TensorFlow.js model from File Server

I am trying to load Tensorflow.js model via HTTP protocol. Tensorflow.js requires me to store 'model.json' and 'weights.bin' files in the same folder. But I can only call 'model.json' as a parameter. It refers to the binary file by itself. That is how it works as far as I know.
For now, in the local environment, I am loading the model from the localhost(Http://127.0.0.1:8080) and it works fine.
However, the actual application accepts HTTPS protocol only. So I have tried to store them with models and weights in the same buckets in S3 and called via Lambda but it seems like only 'model.json' is retrieved. I am thinking of using EC2 instances where the Python Flask server is running but it seems like the same that only model.json is retrieved, not binary files.
Is there any way that I can retrieve 'model.json' with referring to the weight file? Is there anyway to host file server remotely with HTTPS protocol?
TFJS downloads model JSON, parses it and uses whatever paths are specified in the JSON - you can edit that file and set any URL you want for weights.
Alternatively, you can also use lower-level methods to load weights manually (in case you want to have a custom loader, etc.), but leave that for future until you're more comfortable with TFJS.

Failed to deploy component - "Cannot deserialize the current JSON object ..."

Background Information
TFS 2015 RC2
Release Management Server 2015
Azure VM with 2015 deployment agent
Physical local machine with 2015 deployment agent
Both machines need the drop location using the Through Release Management Server over HTTP(S) option. Currently we are using the HTTP side of things over port 1000.
Workflow
Stop App Pool (Working)
Stop Website (Working)
Copy website directory to backup location (Working)
Backup Database (Working)
Deploy Component (Not Working), using either
xcopy
msdeploy (web deploy package)
The Error (TL;DR)
The same error is received every time, it doesn't matter which machine or which deployment method. The component always fails with a JSON.NET issue.
7/22/2015 3:03:39 PM - Error - (13704, 104) - Cannot deserialize the current JSON object (e.g. {"name":"value"}) into type 'System.String[]' because the type requires a JSON array (e.g. [1,2,3]) to deserialize correctly.
To fix this error either change the JSON to a JSON array (e.g. [1,2,3]) or change the deserialized type so that it is a normal .NET type (e.g. not a primitive type like integer, not a collection type like an array or List<T>) that can be deserialized from a JSON object. JsonObjectAttribute can also be added to the type to force it to deserialize from a JSON object.
Path 'ErrorMessage', line 1, position 16.: \r\n\r\n at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent)
at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType)
at Newtonsoft.Json.JsonConvert.DeserializeObject(String value, Type type, JsonSerializerSettings settings)
at Newtonsoft.Json.JsonConvert.DeserializeObject[T](String value, JsonSerializerSettings settings)
at Microsoft.TeamFoundation.Release.Data.Proxy.RestProxy.BaseDeploymentControllerServiceProxy.GetPackageFileInfos(String packageLocation)
at Microsoft.TeamFoundation.Release.DeploymentAgent.Services.Deployer.HttpPackageDownloader.CopyPackageAndUnpackIt(String packageSourceLocation, String filesDestinationLocation)
at Microsoft.TeamFoundation.Release.DeploymentAgent.Services.Deployer.ComponentProcessor.CopyComponentFiles()
at Microsoft.TeamFoundation.Release.DeploymentAgent.Services.Deployer.ComponentProcessor.DeployComponent()
Update (Workaround)
As a work around if I edit the build configuration to have a UNC path as the drop location, the deployment is successful. However I want to use the Copy build output to server option.
Un-installing deployer and installing RM 2015 RTM deployer should fix this issue.
There was an issue in previous RM releases where NewtonSoft.Json dll was not getting upgraded during deployer auto upgrade.
I dont think MS really tested the agent releases with Update 1.
I got the same error which is actually just a generic error message when using deployment through HTTP. When I converted it to deployment through UNC paths I found out what the problem was.
As you might know with TFS 2015 you had to name the release components exactly as the artifact names. So artifact 'WebApp X' has a release component called 'WebApp X' in RM with the subpath 'WebApp X'.
In my release configuration I have 3 different components (and artifacts).
So on the disk it was:
'\build\WebApp X'
'\build\WebApp Y'
'\build\WebApp Z'
Worked perfectly with 2015 RTM.
Now after Update 1 it looks for the following:
'\build\WebApp X\WebApp X'
'\build\WebApp X\WebApp Y\'
'\build'WebApp X\WebApp Z'
I dont know why it does this and how to solve this yet, but I manually altered the folders in the artifacts drop location and RM picked it up fine. So still looking how to fix this that it works correctly.
This json issue occurs if the user under which the RM server app pool is running doesnt have access to the drop location of the component and you have selected 'Through RM server http(s) option'
So as a fix, you can give the app pool user permissions to access the drop. 
you can see the correct error in the server logs.
"Package location '\share\' does not exists or Application Pool user does not have access"

This resource can not be previewed at the moment. - CKAN

I’m running CKAN 2.2 on Ubuntu 12.04 LTS (GNU/Linux 3.2.0-23-generic x86_64).
I have uploaded a dataset to the CKAN instance. It has been uploaded successfully and can be downloaded as well. But when I try to preview the database I end up with below error.
This resource can not be previewed at the moment.
When I click on the “Click here for more information”, It says
Could not load preview: DataProxy returned an error (Request Error:
Backend did not respond after 10 seconds)
How can I fix this error?
The problem is that the data proxy (which is used to transform csv to
something that the data preview can understand) is a server on the
internet. Consequently the files you want to preview have to publicly
accessible from the internet as well. localhost is your own computer
which means that the dataproxy cannot access it. To solve this, wither put the file in the datastore using the datastorer or put the file on a server and provide the correct url.
.
This happens because the data proxy which is used to transform the
data into something we can preview with recline needs the files to be
accessible from the internet. The best solution is to store the data
in the datastore and then the preview will work.
Extracted from here & here
Sometimes you get the same message as the title question:
This resource can not be previewed at the moment.
But when you click on the “Click here for more information”, It says:
Could not load preview: DataProxy returned an error (Data transformation failed. error: An error occured while connecting to the server: DNS lookup failed for URL: http:///dataset/c3ce226b-73bd-4b06-9d1b-ffea13d5f770/resource/580fb05f-6d86-4748-aac7-560b904a208f/download/foo.csv)
In this case, probably the datapusher plugin is not working. First follow the instructions for datapusher in CKAN manual. If you already did this or you installed CKAN from a package, check the CKAN configuration in production.ini (development.ini) file. A small check list to solve the problem:
add datapusher in "ckan.plugins"
set "ckan.site_url"
set "ckan.datapusher.url"
check Apache/nginx server logs (/var/log/apache2/datapusher.*.log, /var/log/apache2/ckan_default*.log)
In my case, the issue was in my development.ini (or production.ini for you maybe) file where the lines for DataPusher's configuration were commented out with a # in the start of the line. Also, the ckan storage config line was also commented.
I uncommented those lines and it was solved.

Resources