NVIDIA Triton vs TorchServe for SageMaker Inference - amazon-sagemaker

NVIDIA Triton vs TorchServe for SageMaker inference? When to recommend each?
Both are modern, production grade inference servers. TorchServe is the DLC default inference server for PyTorch models. Triton is also supported for PyTorch inference on SageMaker.
Anyone has a good comparison matrix for both?

Important notes to add here where both serving stacks differ:
TorchServe does not provide the Instance Groups feature that Triton does (that is, stacking many copies of the same model or even different models onto the same GPU). This is a major advantage for both realtime and batch use-cases, as the performance increase is almost proportional to the model replication count (i.e. 2 copies of the model get you almost twice the throughput and half the latency; check out a BERT benchmark of this here). Hard to match a feature that is almost like having 2+ GPU's for the price of one.
if you are deploying PyTorch DL models, odds are you often want to accelerate them with GPU's. TensorRT (TRT) is a compiler developed by NVIDIA that automatically quantizes and optimizes your model graph, which represents another huge speed up, depending on GPU architecture and model. It is understandably so probably the best way of automatically optimizing your model to run efficiently on GPU's and make good use of TensorCores. Triton has native integration to run TensorRT engines as they're called (even automatically converting your model to a TRT engine via config file), while TorchServe does not (even though you can use TRT engines with it).
There is more parity between both when it comes to other important serving features: both have dynamic batching support, you can define inference DAG's with both (not sure if the latter works with TorchServe on SageMaker without a big hassle), and both support custom code/handlers instead of just being able to serve a model's forward function.
Finally, MME on GPU (coming shortly) will be based on Triton, which is a valid argument for customers to get familiar with it so that they can quickly leverage this new feature for cost-optimization.
Bottom line I think that Triton is just as easy (if not easier) ot use, a lot more optimized/integrated for taking full advantage of the underlying hardware (and will be updated to keep being that way as newer GPU architectures are released, enabling an easy move to them), and in general blows TorchServe out of the water performance-wise when its optimization features are used in combination.

Because I don't have enough reputation for replying in comments, I write in answer.
MME is Multi-model endpoints. MME enables sharing GPU instances behind an endpoint across multiple models and dynamically loads and unloads models based on the incoming traffic.
You can read it further in this link

Related

Difference between SageMaker instance count and Data parallelism

I can't understand the difference between SageMaker instance count and Data parallelism. As we already have a feature that can specify how many instances we train model when we write a training script using sagemaker-sdk.
However, in 2021 re:Invent, SageMaker team launched and demonstrated SageMaker managed Data Parallelism and this feature also provides distributed training.
I've searched a lot of sites for letting me know about that, but I can't find really clear demonstration. I share some stuffs explaining the concept I mentioned closely. Link : https://godatadriven.com/blog/distributed-training-a-diy-aws-sagemaker-model/
Increasing the instance count will enable SageMaker to launch those many instances and copy data to the instances. This will only enable parallelization at the infrastructure level. To really carry out distributed training we need support at framework/code level where the code should know how to aggregate/send gradients across all the GPU's/instances within the cluster. In some case how to distribute data as well usually when using DataLoaders. To achieve this SageMaker has Distributed Data Parallelism feature built into it. This is similar to other alternatives like Horovod, Pytorch DDP etc...

JitterBit vs Dell Boomi vs Celigo

We've narrowed our selection for an ipaas down to the above 3.
Initially we're looking to pass data from a cloud based HR system to Netsuite, and from Netsuite to Salesforce, and sometimes JIRA.
i've come from a Mulesoft background which I think would be too complex for this. On the other hand it seems that Celigo is VERY drag and drop, and there's not much room for modification/customisation.
Of the three, do you have any experience/recommendations? We aren't looking for any code heavy custom APIs, most will just be simple scheduled data transfers but there may be some complexity within the field mapping, and we want to set ourselves up for the future.
I spent a few years removing Celigo from NetSuite and Salesforce. The best way I can describe Celigo is that it is like the old school anti-virus programs which were often worse than the viruses... lol... It digs itself into the end system, making removing it a nightmare.
Boomi does the job, but is very counter-intuitive, and overly complex. You can't do everything from one screen, you can't easily bounce back and forth between tasks/operations/etc. And, sometimes it is very difficult to find where endpoints are used, as they are not always shown in their "where is this used" feature. Boomi has a ton of endpoint connectors pre-built (the most, I believe), but I have not seen an easy way to just create your own. Boomi also has much more functionality than just the integrations, if that is something that may be needed.
Jitterbit, my favorite, is ridiculously simple to use. You can access everything from one main screen, you can connect to anything (as long as it can reach out to the network, or you can reach it via the network - internal or external). Jitterbit has a lot of pre-built endpoint connectors. It is also extremely easy to just create a connection to anything you want. The win with Jitterbit is that it is super easy to use, super easy to learn, it always works, they have amazing support (if you need it). I have worked with Jitterbit the most (about 6 years), and I have never been unable to complete an integration task in less that a couple of day, max.
I have extensive experience with Dell Boomi platform but none with JitterBit or Celigo. Dell Boomi offers very versatile and well supported iPaaS solution. The technical challenges of Boomi are some UI\usability issues (#W3BGUY mentioned the main ones) and the lack of out-of-the-box support for CI/CD and DevOps processes (code management, versioning, deployments etc.)
One more important component to consider here is the pricing of the platform. Boomi does charge their clients yearly connection prices. Connection is defined as a unique combination of URL, username and password. The yearly license costs vary and can range anywhere between ($1,000 - $12,000) per license per year. The price depends greatly on your integration landscape and the discounts provided so I would advise on engaging with vendor early to understand your costs. Would be great to hear from others on pricing for JitterBit and Celigo.
Boomi is also more than just an iPaaS platform. They offer other modules of their platform to customers: API Management, Boomi Flow (workflow and automation module), Master Data Hub (master data management). Some of these modules are well developed and some are in their infancy (API Management).
From my limited experience with MuleSoft platform, I share the OP's sentiments about it being too complex for simple integrations. They do provide great CI/CD and DevOps functionality though if that is something that is needed.
There is not a simple answer to a question like this. One needs to look at multiple aspects of the platform and make a decision based on multitude of factors. I would advise looking at Gartner and Forrester reports for a general guidelines and working out the pricing (initial and recurring) with the vendor.
I have only used Jitterbit, so can only comment on that. It works fine. It is pretty intuitive and easy to use, but does have some flexibility with writing your own queries, defining and mapping file formats, and choosing different transfer protocols.
I've only used the free version (which you need to host somewhere and also is not supported) and it was good enough for production tasks. If you have the luxury of time, I'd say download it and try it out. If it works for you, throw it on a server or upgrade to the cloud version.
One note: Jitterbit uses background services. If you run it locally and then decide to migrate your account to a server, you need to stop those services on your local. Otherwise, it will try to run jobs from both locations and that doesn't turn out well.
Consider checking out Choreo as well. It has a novel simultaneous code + low-code approach for integration development. And provides rich AI support for performance monitoring, debugging, and data mapping.
Disclaimer: I'm a member of the project.

How to use Cloud ML Engine for Context Aware Recommender System

I am trying to build Context Aware Recommender System with Cloud ML Engine, which uses context prefiltering method (as described in slide 55, solution a) and I am using this Google Cloud tutorial (part 2) to build a demo. I have split the dataset to Weekday and Weekend contexts and Noon and Afternoon contexts by timestamp for purposes of this demo.
In practice I will learn four models, so that I can context filter by Weekday-unknown, Weekend-unknown, unknown-Noon, unknown-Afternoon, Weekday-Afternoon, Weekday-Noon... and so on. The idea is to use prediction from all the relevant models by user and then weight the resulting recommendation based on what is known about the context (unknown meaning, that all context models are used and weighted result is given).
I would need something, that responds fast and it seems like I will unfortunately need some kind of middle-ware if I don't want do the weighting in the front-end.
I know, that AppEngine has prediction mode, where it keeps the models in RAM, which guarantees fast responses, as you don't have to bootstrap the prediction models; then resolving the context would be fast.
However, would there be more simple solution, which would also guarantee similar performance in Google Cloud?
The reason I am using Cloud ML Engine is that when I do context aware recommender system this way, the amount of hyperparameter tuning grows hugely; I don't want to do it manually, but instead use the Cloud ML Engine Bayesian Hypertuner to do the job, so that I only need to tune the range of parameters one to three times per each context model (with an automated script); this saves much of Data Scientist development time, whenever the dataset is reiterated.
There are four possible solutions:
Learn 4 models and use SavedModel to save them. Then, create a 5th model that restores the 4 saved models. This model has no trainable weights. Instead, it simply computes the context and applies the appropriate weight to each of the 4 saved models and returns the value. It is this 5th model that you will deploy.
Learn a single model. Make the context a categorical input to your model, i.e. follow the approach in https://arxiv.org/abs/1606.07792
Use a separate AppEngine service that computes the context and invokes the underlying 4 services, weighs them and returns the result.
Use an AppEngine service written in Python that loads up all four saved models and invokes the 4 models and weights them and returns the result.
option 1 involves more coding, and is quite tricky to get right.
option 2 would be my choice, although it changes the model formulation from what you desire. If you go this route, here's a sample code on MovieLens that you can adapt: https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/movielens
option 3 introduces more latency because of the additional network overhead
option 4 reduces network latency from #3, but you lose the parallelism. You will have to experiment between options 3 and 4 on which provides better performance overall

AWS Sagemaker custom user algorithms: how to take advantage of extra instances

This is a fundamental AWS Sagemaker question. When I run training with one of Sagemaker's built in algorithms I am able to take advantage of the massive speedup from distributing the job to many instances by increasing the instance_count argument of the training algorithm. However, when I package my own custom algorithm then increasing the instance count seems to just duplicate the training on every instance, leading to no speedup.
I suspect that when I am packaging my own algorithm there is something special I need to do to control how it handles the training differently for a particular instance inside of the my custom train() function (otherwise, how would it know how the job should be distributed?), but I have not been able to find any discussion of how to do this online.
Does anyone know how to handle this? Thank you very much in advance.
Specific examples:
=> It works well in a standard algorithm: I verified that increasing train_instance_count in the first documented sagemaker example speeds things up here: https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model-create-training-job.html
=> It does not work in my custom algorithm. I tried taking the standard sklearn build-your-own-model example and adding a few extra sklearn variants inside of the training and then printing out results to compare. When I increase the train_instance_count that is passed to the Estimator object, it runs the same training on every instance, so the output gets duplicated across each instance (the printouts of the results are duplicated) and there is no speedup.
This is the sklearn example base: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb . The third argument of the Estimator object partway down in this notebook is what lets you control the number of training instances.
Distributed training requires having a way to sync the results of the training between the training workers. Most of the traditional libraries, such as scikit-learn are designed to work with a single worker, and can't just be used in a distributed environment. Amazon SageMaker is distributing the data across the workers, but it is up to you to make sure that the algorithm can benefit from the multiple workers. Some algorithms, such as Random Forest, are easier to take advantage of the distribution, as each worker can build a different part of the forest, but other algorithms need more help.
Spark MLLib has distributed implementations of popular algorithms such as k-means, logistic regression, or PCA, but these implementations are not good enough for some cases. Most of them were too slow and some even crushed when a lot of data was used for the training. The Amazon SageMaker team reimplemented many of these algorithms from scratch to benefit from the scale and economics of the cloud (20 hours of one instance costs the same as 1 hour of 20 instances, just 20 times faster). Many of these algorithms are now more stable and much faster beyond the linear scalability. See more details here: https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html
For the deep learning frameworks (TensorFlow and MXNet) SageMaker is using the built-in parameters server that each one is using, but it is taking the heavy lifting of the building the cluster and configuring the instances to communicate with it.

What factors to consider when choosing a Multi-model DBMS? (OrientDB vs ArangoDB)

I am looking to dip my hands into the world of Multi-Model DBMS, I have no particular use cases, just want to start learning.
I find that there are two prominent ones - OrientDB vs ArangoDB, but was unable to find any meaningful comparison, unopinionated between them. Can someone shed some light on the difference in features between the two, and any caveats in using one over the other? If I learn one would I be able to easily transition to the other?
(I tagged FoundationDB as well, but it is proprietary and I probably won't consider it)
This question asks for a general comparison between OrientDB vs ArangoDB for someone looking to learn about Multi-model DBMS, and not an opinionated answer about which is better.
Disclaimer: I would no longer recommend OrientDB, see my comments below.
I can provide a slightly less biased opinion, having used both ArangoDB and OrientDB. It's still biased as I'm the author of OrientDB's node.js driver - oriento but I don't have a vested interest in either company or product, I've just necessarily used OrientDB more.
ArangoDB and OrientDB are both targeting a similar market and have a lot of similarities:
Both are multi-model, you can use them to store documents, graphs and simple key / values.
Both have support for Gremlin, but it's firmly a second class citizen compared to their own preferred query languages.
Both support server-side "stored procedures" in JavaScript. In both systems this comes via a slightly less than idiomatic JavaScript API, although ArangoDB's is a lot better. This is getting fixed in a forthcoming version of OrientDB.
Both offer REST APIs, both aim to be usable as an "API Server" via JavaScript request handlers. This is a lot more practical in ArangoDB than OrientDB.
Both are distributed under a permissive license.
Both are ACID and have transaction support, but in both the transactions are server-side operations - they're more like atomic batches of commands rather than the kinds of transactions you might be used to in a traditional RDBMS.
However, there are a lot of differences:
ArangoDB has no concept of "links", which are a very useful feature in OrientDB. They allow unidirectional relationships (just like a hyperlink on the web), without the overhead of edges.
ArangoDB is written in C++ (and JavaScript), whereas OrientDB is written in Java. Both have their advantages:
Being written in C++ means ArangoDB uses V8, the same high performance JavaScript engine that powers node.js and Google Chrome. Whereas being written in Java means OrientDB uses Nashorn, which is still fast but not the fastest. This means that ArangoDB can offer a greater level of compatibility with the node.js ecosystem compared to OrientDB.
Being written in Java means that OrientDB runs on more platforms, including e.g. Raspberry PI. It also means that OrientDB can leverage a lot of other technologies written in Java, e.g. OrientDB has superb full text / geospatial search support via Lucene, which is not available to ArangoDB.
OrientDB uses a dialect of SQL as its query language, whereas ArangoDB uses its own custom language called AQL. In theory, AQL is better because it's designed explicitly for the problem, in practise though it feels quite similar to SQL but with different keywords, and is yet another language to learn while OrientDB's implementation feels a lot more comfortable if you're used to SQL. SQL is declarative whereas AQL is imperative - YMMV here.
ArangoDB is a "mostly-memory" database, it works best when most of your data fits in RAM. This may or may not be suitable for your needs. OrientDB doesn't have this restriction (but also loves RAM).
OrientDB is fully object oriented - it supports classes with properties and inheritance. This is exceptionally useful because it means that your database structure can map 1-1 to your application structure, with no need for ugly hacks like ActiveRecord. ArangoDB supports something fairly similar via models in Foxx, but it's more like an optional addon rather than a core part of how the database works.
ArangoDB offers a lot of flexibility via Foxx, but it has not been designed by people with strong server-side JS backgrounds and reinvents the wheel a lot of the time. Rather than leveraging frameworks like express for their request handling, they created their own clone of Sinatra, which of course makes it almost the same as express (express is also a Sinatra clone), but subtly different, and means that none of express's middleware or plugins can be reused. Similarly, they embed V8, but not libuv, which means they do not offer the same non blocking APIs as node.js and therefore users cannot be sure about whether a given npm module will work there. This means that non trivial applications cannot use ArangoDB as a replacement for the backend, which negates a lot of the potential usefulness of Foxx.
OrientDB supports first class property level and database level indices. You can query and insert into specific indexes directly for maximum efficiency. I've not seen support for this in ArangoDB.
OrientDB is the more established option, with many high profile users. ArangoDB is newer, less well known, but growing fast.
ArangoDB's documentation is excellent, and they offer official drivers for many different programming languages. OrientDB's documentation is not quite as good, and while there are drivers for most platforms, they're community powered and therefore not always kept up to date with bleeding edge OrientDB features.
If you're using Java (or a Java bridge), you can embed OrientDB directly within your application, as a library. This use case is not possible in ArangoDB.
OrientDB has the concept of users and roles, as well as Record Level Security. This may be a killer feature for you, it is for me. It also supports token based authentication, so it's possible to use OrientDB as your primary means of authorizing/authenticating users. OrientDB also has LDAP integration. In contrast, ArangoDB support only a very simple auth option.
Both systems have their own advantages, so choosing between them comes down to your own situation:
If you're building a small application, and you're a web developer optimizing for developer productivity, it will probably be easier to get up and running quickly with ArangoDB.
If you're building a larger application, which could potentially store many gigabytes or terabytes of data, or have many thousands of concurrent users, or have "enterprise" use cases, or need fine grained security controls, OrientDB is the one for you.
If you're storing RDF or similarly structured linked data, choose OrientDB.
If you're using Java, just choose OrientDB.
Note: This is (my opinion of) the state of play today, things change quickly and I would not underestimate the ruthless efficiency of the awesome team behind ArangoDB, I just think that it's not quite there yet :)
Charles Pick (codemix.com)

Resources