How to use Cloud ML Engine for Context Aware Recommender System - google-app-engine

I am trying to build Context Aware Recommender System with Cloud ML Engine, which uses context prefiltering method (as described in slide 55, solution a) and I am using this Google Cloud tutorial (part 2) to build a demo. I have split the dataset to Weekday and Weekend contexts and Noon and Afternoon contexts by timestamp for purposes of this demo.
In practice I will learn four models, so that I can context filter by Weekday-unknown, Weekend-unknown, unknown-Noon, unknown-Afternoon, Weekday-Afternoon, Weekday-Noon... and so on. The idea is to use prediction from all the relevant models by user and then weight the resulting recommendation based on what is known about the context (unknown meaning, that all context models are used and weighted result is given).
I would need something, that responds fast and it seems like I will unfortunately need some kind of middle-ware if I don't want do the weighting in the front-end.
I know, that AppEngine has prediction mode, where it keeps the models in RAM, which guarantees fast responses, as you don't have to bootstrap the prediction models; then resolving the context would be fast.
However, would there be more simple solution, which would also guarantee similar performance in Google Cloud?
The reason I am using Cloud ML Engine is that when I do context aware recommender system this way, the amount of hyperparameter tuning grows hugely; I don't want to do it manually, but instead use the Cloud ML Engine Bayesian Hypertuner to do the job, so that I only need to tune the range of parameters one to three times per each context model (with an automated script); this saves much of Data Scientist development time, whenever the dataset is reiterated.

There are four possible solutions:
Learn 4 models and use SavedModel to save them. Then, create a 5th model that restores the 4 saved models. This model has no trainable weights. Instead, it simply computes the context and applies the appropriate weight to each of the 4 saved models and returns the value. It is this 5th model that you will deploy.
Learn a single model. Make the context a categorical input to your model, i.e. follow the approach in https://arxiv.org/abs/1606.07792
Use a separate AppEngine service that computes the context and invokes the underlying 4 services, weighs them and returns the result.
Use an AppEngine service written in Python that loads up all four saved models and invokes the 4 models and weights them and returns the result.
option 1 involves more coding, and is quite tricky to get right.
option 2 would be my choice, although it changes the model formulation from what you desire. If you go this route, here's a sample code on MovieLens that you can adapt: https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/movielens
option 3 introduces more latency because of the additional network overhead
option 4 reduces network latency from #3, but you lose the parallelism. You will have to experiment between options 3 and 4 on which provides better performance overall

Related

NVIDIA Triton vs TorchServe for SageMaker Inference

NVIDIA Triton vs TorchServe for SageMaker inference? When to recommend each?
Both are modern, production grade inference servers. TorchServe is the DLC default inference server for PyTorch models. Triton is also supported for PyTorch inference on SageMaker.
Anyone has a good comparison matrix for both?
Important notes to add here where both serving stacks differ:
TorchServe does not provide the Instance Groups feature that Triton does (that is, stacking many copies of the same model or even different models onto the same GPU). This is a major advantage for both realtime and batch use-cases, as the performance increase is almost proportional to the model replication count (i.e. 2 copies of the model get you almost twice the throughput and half the latency; check out a BERT benchmark of this here). Hard to match a feature that is almost like having 2+ GPU's for the price of one.
if you are deploying PyTorch DL models, odds are you often want to accelerate them with GPU's. TensorRT (TRT) is a compiler developed by NVIDIA that automatically quantizes and optimizes your model graph, which represents another huge speed up, depending on GPU architecture and model. It is understandably so probably the best way of automatically optimizing your model to run efficiently on GPU's and make good use of TensorCores. Triton has native integration to run TensorRT engines as they're called (even automatically converting your model to a TRT engine via config file), while TorchServe does not (even though you can use TRT engines with it).
There is more parity between both when it comes to other important serving features: both have dynamic batching support, you can define inference DAG's with both (not sure if the latter works with TorchServe on SageMaker without a big hassle), and both support custom code/handlers instead of just being able to serve a model's forward function.
Finally, MME on GPU (coming shortly) will be based on Triton, which is a valid argument for customers to get familiar with it so that they can quickly leverage this new feature for cost-optimization.
Bottom line I think that Triton is just as easy (if not easier) ot use, a lot more optimized/integrated for taking full advantage of the underlying hardware (and will be updated to keep being that way as newer GPU architectures are released, enabling an easy move to them), and in general blows TorchServe out of the water performance-wise when its optimization features are used in combination.
Because I don't have enough reputation for replying in comments, I write in answer.
MME is Multi-model endpoints. MME enables sharing GPU instances behind an endpoint across multiple models and dynamically loads and unloads models based on the incoming traffic.
You can read it further in this link

Google App Engine ndb dynamic indexes alternative

Background
I'm creating an application that allows users to define their own datasets with custom properties (and property types).
A user could, while interacting with the application, define a dataset that has the following columns:
Name:String
Location: Geo
Weight:float
Notes:Text(not indexed)
How many:int
etc...
While there will be restrictions on the total number of properties (say 10-20 or something), there are no restrictions on the property types.
Google's ndb datastore allows this to happen and will auto-generate simple indexes for searches involving combinations of equality operators and no sorts, or only sorts.
Ideal
Multiple sorts
Equality and sorts
Combinations of inequalities
I'm trying to determine if I should use NDB at all, or switch to something else (SQL seems extremely expensive, comparatively, which is one of the reasons I'm hesitant).
For the multiple sorts, I could write server-side code that queries for the first, then sorts in memory by the second, third, etc. I could also query for the data and do the sorting on the client side.
For the combinations of inequalities, I could do the same (more or less).
These solutions are obviously not performant, and won't scale if there are a large number of items that match the first query.
BaaS providers like Kinvey (which runs on GAE unless I'm quite mistaken) use schemaless databases and allow you to both create them on the fly and make compound, complicated queries over the data.
Sanity Check:
Trying to force NDB into what I want seems like a bad idea, unless there's something I'm overlooking (possible) that would make this more doable. My solutions would work, but wouldn't scale well (though I'm not sure how far. Would they work for 10k objects? 100k? 1M?).
Options I've investigated:
Kinvey, which charges by user and by data stored (since they just changed their pricing model), and ends up costing quite a bit.
Stackmob is also nice, but cloud code is crazy expensive ($200/month), and hosting and such all just costs more. Tasks cost more. The price looks very high.
Question:
I've done a fair bit of investigating, but there are just so many options. Assuming that my sanity check is correct (if it's not, and doing in-memory operations is sort-of-scalable, then fantastic!), are there other options out there that are inexpensive (BaaS providers get quite expensive once the applications scale), fast, and easily scalable that would solve my problem? Being able to run custom code in the cloud easily (and cheaply) and have API calls and bandwidth cost next to nothing is one of the reasons I've been investigating GAE (full hosting provider, any code I want in the cloud, etc).

Activity Feed with Riak

This week I read an interesting article which explain how the authors implemented an activity. Basically, they're using two approaches to handle activities, which I'm adapting to my scenario, so supposing we hava an user foo who has a certain number (x) of followers:
if x<500, then the activity will be copyied to every follower feed
this means slow writes, fast reads
if x>500, only a link will be made between foo and his followoers
in theory, fast writes, but will slow reads
So when some user access your activity feed, the server will fetch and merge all data, so this means fast lookups in their own copyied activities and then query accross the links. If a timeline has a limit of 20, then I fetch 10 of each and then merge.
I'm trying to do it with Riak and the feature of Linking, so this is my question: is linking faster than copy? My idea of architecture is good enough? Are there other solutions and/or technologies which I should see?
PS.: I'm not implementing a activity feed for production, it's just for learning how to implement one which performs well and use Riak a bit.
Two thoughts.
1) No, Linking (in the sense of Riak Link Walking) is very likely not the right way to implement this. For one, each link is stored as a separate HTTP header, and there is a recommended limit in the HTTP spec on how many header fields you should send. (Although, to be fair, in tests you can use upwards of a 1000 links in the header with Riak, seems to work fine. But not recommended). More importantly, querying those links via the Link Walking api actually uses MapReduce on the backend, and is fairly slow for the kind of usage you're intending it for.
This is not to say that you can't store JSON objects that are lists of links, sure, that's a valid approach. I'm just recommending against using Riak links for this.
2) As for how to properly implement it, that's a harder question, and depends on your traffic and use case. But your general approach is valid -- copy the feed for some X value of updates (whether X is 500 or much smaller should be determined in testing), and link when the number of updates is greater than X.
How should you link? You have 3 choices, all with tradeoffs. 1) Use Secondary Indices (2i), 2) Use Search, or 3) Use links "manually", meaning, store JSON documents with URLs that you dereference manually (versus using link walking queries).
I highly recommend watching this video: http://vimeo.com/album/2258285/page:2/sort:preset/format:thumbnail (Building a Social Application on Riak), by the Clipboard engineers, to see how they solved this problem. (They used Search for linking, basically).

My App Engine app is showing way more write ops than expected. How can I diagnose what's going on?

95% of my app costs are related to write operations. In the last few days I've paid $150. And I wouldn't consider the amount of data stored to be that huge.
First I suspected that there may be a lot of write operations because of exploding indexes, but I read that this situation usually happens when you have two list properties in the same model. But during the model design phase those were limited to 1 per model max.
I also did try to go through my models and pass indexed=False to all properties that I will not need to order or filter by.
One other thing that I would need to disclose about my app is that I have bundled write operations in the sense that there are some entities that when they need to be stored, I usually call a proxy function that stores that entity and derivative entities along with it. Not in a transactional way since it's not a big deal if there's a write failure every now and then. But I don't really see a way around that given the logic of how the models are related.
So I'm eager to hear if someone else faced that problem and which approach / tools /etc they followed to solve it. Or if there are just some general things one can do..
You can try appstats tool. It can show you datastore calls stats, etc.

Document/Image Database Repository Design Question

Question:
Should I write my application to directly access a database Image Repository or write a middleware piece to handle document requests.
Background:
I have a custom Document Imaging and Workflow application that currently stores about 15 million documents/document images (90%+ single page, group 4 tiffs, the rest PDF, Word and Excel documents). The image repository is a commercial, 3rd party application that is very expensive and frankly has too much overhead. I just need a system to store and retrieve document images.
I'm considering moving the imaging directly into a SQL Server 2005 database. The indexing information is very limited - basically 2 index fields. It's a life insurance policy administration system so I index images with a policy number and a system wide unique id number. There are other index values, but they're stored and maintained separately from the image data. Those index values give me the ability to look-up the unique id value for individual image retrieval.
The database server is a dual-quad core windows 2003 box with SAN drives hosting the DB files. The current image repository size is about 650GB. I haven't done any testing to see how large the converted database will be. I'm not really asking about the database design - I'm working with our DBAs on that aspect. If that changes, I'll be back :-)
The current system to be replaced is obviously a middleware application, but it's a very heavyweight system spread across 3 windows servers. If I go this route, it would be a single server system.
My primary concerns are scalabity and performace - heavily weighted toward performance. I have about 100 users, and usage growth will probably be slow for the next few years.
Most users are primarily read users - they don't add images to the system very often. We have a department that handles scanning and otherwise adding images to the repository. We also have a few other applications that receive documents (via ftp) and they insert them into the repository automatically as they are received, either will full index information or as "batches" that a user reviews and indexes.
Most (90%+) of the documents/images are very small, < 100K, probably < 50K, so I believe that storage of the images in the database file will be the most efficient rather than getting SQL 2008 and using a filestream.
Oftentimes scalability and performance are ultimately married to each other in the sense that six months from now management comes back and says "Function Y in Application X is running unacceptably slow, how do we speed it up?" And all too the often the answer is to upgrade the back end solution. And when it comes to upgrading back ends, its almost always going to less expensive to scale out than to scale up in terms of hardware.
So, long story short, I would recommend building a middleware app that specifically handles incoming requests from the user app and then routes them to the appropriate destination. This will sufficiently abstract your front-end user app from the back end storage solution so that when scalability does become an issue only the middleware app will need to be updated.
This is straightforward. Write the application to an interface, use some kind of factory mechanism to supply that interface, and implement that interface however you want.
Once you're happy with your interface, then the application is (mostly) isolated from the implementation, whether it's talking straight to a DB or to some other component.
Thinking ahead a bit on your interface design but doing bone stupid, "it's simple, it works here, it works now" implementations offers a good balance of future proofing the system while not necessarily over engineering it.
It's easy to argue you don't even need an interface at this juncture, rather just a simple class that you instantiate. But if your contract is well defined (i.e. the interface or class signature), that is what protects you from change (such as redoing the back end implementation). You can always replace the class with an interface later if you find it necessary.
As far as scalability, test it. Then you know not only if you may need to scale, but perhaps when as well. "Works great for 100 users, problematic for 200, if we hit 150 we might want to consider taking another look at the back end, but it's good for now."
That's due diligence and a responsible design tactic, IMHO.
I agree with gabriel1836. However, an added benefit would be that you could for a time run a hybrid system for a time since you aren't going to convert 14 millions documents from your proprietary system to you home grown system overnight.
Also, I would strongly encourage you to store the documents outside of a database. Store them on a file system (local, SAN, NAS it doesn't matter) and store pointers to the documents in the database.
I'd love to know what document management system you are using now.
Also, don't underestimate the effort of replacing the capture (scanning and importing) provided by the proprietary system.

Resources