Sagemaker Inference Endpoint with fitted Encoder - amazon-sagemaker

So as I don't get any help by reading documentations and blogposts I ll ask over here:
I want to deploy a Sagemaker Endpoint with fitting a Sagemaker Pipeline. I want to have an endpoint which is backed by a PipelineModel. This PipelineModel should consist of two models: A fitted model which encodes my data and a model which predicts with an XGBoost estimator. I follow along this docu: enter link description here
But this example doesn't show how to integrate the fitted preprocessor model in a PipelineStep. What Step do I have to use? A TrainingStep? Thanks in advance. I am desperate

Check out this official example: Train register and deploy a pipeline model.
The two variations to keep in mind:
For models that need training (usually for those based on tensorflow/pytorch), a TrainingStep must be used so that the output (the model artifact) is correctly (and automatically) generated with the ability to use it later for inference.
For models generated by a simple fitting on the data (e.g., a scaler with sklearn), you can think about creating a TrainingStep in disguised (it is an extra component in pipeline, it is not very correct to do it but it is a working round) but the more correct method is to configure the preprocessing script so that it internally saves a model.tar.gz file with the necessary files (e.g., pickle or joblib objects) inside it can then be properly used in later steps as model_data. In fact, if you have a model.tar.gz, you can define a Model of various types (e.g., an SKLearnModel) that is already fitted.
At this point, you define your PipelineModel with the trained/fitted models and can either proceed to direct endpoint deployment or decide to go through the model registry and keep a more robust approach.

Related

Is there a way to show pdf in its original structure in the human review custom entity labelling in aws sagemaker?

I have modified this sample to read PDFs in tabular format. I would like to keep the tabular structure of the original pdf when doing the human review process. I notice the custom worker task template uses the crowd-entity-annotation element which seems to read only texts. I am aware that the human reviewer process reads from an S3 key which contains raw text written by the textract process.
I have been considering writing to S3 using tabulate but I don't think that is the best solution. I would like to keep the structure and still have the ability to annotate custom entities.
Comprehend now natively support to detect custom-defined entities for pdf documents. To do so, you can try the following steps:
Follow this github readme to start the annotation process for PDF documents.
Once the annotations are produced. You can use Comprehend CreateEntityRecognizer API to train a custom entity model for Semi-structured document”
Once entity recognizer is trained, you can use StartEntitiesDetectionJob API to run inference for PDF documents

Environment variable as custom metadata type in Salesforce

I am trying to represent environment variables in the Salesforce codebase and came across Custom Metadata Types. So based on which Sandbox I am in, I want to vary the baseURL of an external service that I am hitting from my apex class. I want to avoid hard coding anything in the class, and hence trying to find out an environment variable like solution.
How would you represent the URL as a custom metadata type? Also, how can I access it in the class? What happens when a qa sandbox is refreshed from prod? Do they custom metadata type records get overridden?
How are you calling that external service? If it's truly a base url you might be better of using "named credential" for it. It'll abstract the base url away for you, include authentication or certificate if you have to present any...
Failing that - custom metadata might be a poor choice. They're kind of dictionary objects, you can add more (but not from apex) but if you deploy stuff using Git/Ant/SFDX CLI rather than changesets it'd become bit pain, you'd need different custom metadata value for sandbox vs prod. Kinda defeats the purpose.
You might be better off using custom setting instead (hierarchy is enabled by default, list you'd have to flip a checkbox in setup. List is useful if you need key-value kind of pairs, similar to custom metadata): https://salesforce.stackexchange.com/questions/74049/what-is-the-difference-between-custom-settings-and-custom-metadata-types
And you can modify them with Apex too. Which means that in ideal world you could have a "postcopy" class running as soon as sandbox is refreshed that overwrites the custom setting with the non-prod value. For named credential I don't think you can pull it off, you'd need a mini deployment that changes it or manual step (have you seen https://salesforce.stackexchange.com/q/955/799 ?)

Use Tensorflow trained models as a service

I just started with tensorflow. I was able to successfully train it for a data set that I created. Now the question is that how will I be able to use this model to make predictions. I want to make it as a REST service, to which I will be able to pass some values and get the predictions as response. Any helpful links are also welcome. The model is currently on a VM.
Thanks :)
Have you seen Cloud ML on GCP? It might be exactly what you're looking for.
https://cloud.google.com/ml/
You might need to make a few tweaks to the architecture of your model - like variable batch sizes and adding inputs/outputs to collections - but they are well explained in the documentation.
If performance, scalability and a short downtime if you decide to update the model are not an issue, you could also consider just having a simple flask server with tensorflow installed on it.
If you don't want to use Cloud ML and need to serve a large amount of requests then look into the tensorflow serving.
First of all: try saving and loading your model: https://www.tensorflow.org/versions/r0.11/how_tos/variables/index.html
Then, after training, you can simply call:
rest_prediction = sess.run(prediction_tensor, feed_dict={x_tensor: user_input})
An important difference is that during training you have batch_size inputs, but when you have a REST server you have 1 input. The shape (https://www.tensorflow.org/versions/r0.11/resources/dims_types.html) of your tensors should be variable. How you can achieve this can be found here: https://www.tensorflow.org/versions/r0.11/resources/faq.html#tensor-shapes
If you post a short and simple code snippet we might be able to help you better.

Database Driven Model Mapping in Java

So, I have a project where I get data from a dozen different sources, some are database objects, most often the data is in different JSON formats, or often XML formats. So, I need to take this disparate data and pull it into one single clean managed object that we control.
I have seen dozens of different posts on various tools to do object to object mapping. Orika being one of them, etc. But the problem is that Orika, like many of these still need solid classes defined to do the mapping. If there is a change to the mapping, then I have to change my class, re-commit it, then do a build and deploy new code ... BTW, testing would also have to be done like any code change. So, maybe some of these tools aren't a great solution for me.
Then I was looking to do some sort of database-driven mapping, where I have a source, a field, and then the new field or function I would like to take it to. So, with a database-driven tool, I could modify the fields in the database, and everything would keep working as it should. I could always create a front-end to modify this tool.
So, with that ... I am asking if there is any database-driven tool where I can map field to field, or fields to functions type of mapping? Drools was my first choice, but I don't know if it is my best choice? Maybe it is overkill for my needs? So, I was looking for advice on what might be the best tool to do my mapping.
Please let me know if you need any more information from me, and thanks for all the help!
Actually Orika can handle dynamic Data source like that, there is even an example on how to convert from XML Element (DOM API) or even JsonObject.
You can use an XML parser to convert your data into Element object, or Jackson to get JsonObject
Then define you class map between your "Canonical" Java Class and these dynamic "Classes"
http://orika-mapper.github.io/orika-docs/advanced-mappings.html Customizing the PropertyResolverStrategy
Here is an example of Orika mapping to MongoDB DBObject to Java Bean
https://gist.github.com/elaatifi/ade7321a1405c61ff8a9
However converting JSON is more straightforward than XML (the semantic of Attributes/Childs/Custom tags do not match with JavaBeans)

Using Skue or similar frameworks to build REST API on google-app-engine

Searching for ways to build REST APIs, I found skue (https://code.google.com/p/skue/). However there is not much information on the site. My plan is to build a rest api as follows strictly:
Models << Business logics << Restful Resources.
What this means is: the models are access exclusively by the business logic; the restful resources interface is the only layer a client has direct access to. I am specifying all this to avoid people suggesting using the appengine-rest-server.
My question is: has anyone ever successfully used Skue? If so do you have any examples you would not mind sharing? GET and POST would be sufficient, but more is welcomed. If not Skue, are there any frameworks out there that allow building such rest-apis on top of the google-app-engine?
I'm the author of Skuë. Skuë means "mouse" in Bribrí which is the language of an indigenous group of people of Costa Rica, my Country.
I know there isn't enough information on the site: (https://code.google.com/p/skue/)
For developers that want to use it on their own projects. I'm sorry for that I just haven't had the time to do a proper documentation since this is just a side project and not my daily work.
However, I'm willing to help you out with ramping up so you will be able to use it. The first thing to notice is the small example that it's part of the source code. Go to the site then click on Source -> Browse and then expand the "app" branch.
The code inside of the "app" folder represents your own API implementation. The package "skue" contains the actual implementation of the library so basically you just create your Python project for Google App Engine and includes the skuë package directly into it.
Now overwrite your main.py file with the content of the downloaded main.py: main.py on Skuë project.
The most important part of that file is where you put your own routes to your resources implementations: Notice here the use of the "ContactResource".
TASK_HANDLERS = [
]
API_HANDLERS = [
('/contacts/(.*)', ContactResource)
]
API_DOC = [ ('/', ApiDocumentationResource) ]
Browse to the contact resource implementation.
There are a lot of things going on under the hood there.. but the idea is for you to not worry about those.
You need to inherit from the proper Resource parent class depending on the kind of resource you want to create, there are four basic types:
DocumentResource: A document resource is a singular concept that is akin to an object instance or database record.
CollectionResource: A collection resource is a server-managed directory of resources. Clients may propose new resources to be added to a collection. However, it is up to the collection to choose to create a new resource, or not.
StoreResource: A store is a client-managed resource repository. A store resource lets an API client put resources in, get them back out, and decide when to delete them.
ControllerResource: A controller resource models a procedural concept. Controller resources are like exe- cutable functions, with parameters and return values; inputs and outputs.
Like a traditional web application’s use of HTML forms, a REST API relies on controller resources to perform application-specific actions that cannot be logically mapped to one of the standard methods (create, retrieve, update, and delete, also known as CRUD).
Now take a look at the "describe_resource" implementation on ContactResource example. When you inherit from the basic resource types described above the next step is to programmatically describe your resource to the outside world using that method. The underlying Skuë implementation uses that method to validate require parameters and also to self describe the endpoints when you perform an OPTIONS request on them.
And the final step is for you to implement the methods (CRUD) that you want to handle for your resource.
Again with the ContactResource example, that resource handles the creation, update and read of Contact items.
I hope this helps you at least to understand how to start using the library. I will create better tutorials in the future, though.
In the meantime you can contact me via email: greivin.lopez#gmail.com and I will send you a more elaborated example or even something that matches your requirements.
Important Note: Currently the Skuë project only supports responses in JSON format. If you plan to use another format you will need to create the proper classes to handle it.
Greetings from Costa Rica.
I haven't used skue, but what you're looking for sounds like a good fit for Google Cloud Endpoints. See my previous answers on the subject for more details.

Resources