We currently use Silverlight 4 with WCF services and try to read large arrays of users objects from service. In our code it takes about 0.5 (and less) seconds to generate 700 objects, arranged by hierarchy (a lot of loops).
And it takes about 4-5 seconds for Silverlight/WCF to communicate that data - on localhost.
I've measured timings in my code / service call, used Fiddler to see data (5MBs!), and when I tried to pass a simplified object with plain attributes (instead of nested lists, etc), it took much less amount of data and was very quick - like, a second.
I've read many articles on the subject - there's no simple way, the best I could find is to return byte[] from WCF method (and have types in the separate assembly), or highly manual serializers (like protobuf) that require to write custom attributes, etc.
OK I tried those. protobuf-net is extremely hard (adding numbers to 200 existing classes isn't fun), and v2 is not here yet, and binaryMessageEncoding reduced data load from 5.5MB to 4.5MB, not too much.
But, I can't believe, is there any out-of-the-box WCF/Silverlight solution to stream large amounts of data? Isn't it supposed to be a nice modern technology for enterprise solutions?
How do I tell Silverlight/WCF to stream my data faster and smaller, rather than 5MBs in 5 seconds? Can I just say in config: "use small and fast serializer"?
I've found the SharpSerializer package very easy to use for fast binary serlization in Silverlight: http://www.sharpserializer.com/en/index.html. The resulting serialized data size is much smaller than using the DataContract serializer or other text-based serializers.
Does IIS have compression enabled. This will however impact CPU and you might need to double check if silverlight honors the deflate http header?
Related
NVIDIA Triton vs TorchServe for SageMaker inference? When to recommend each?
Both are modern, production grade inference servers. TorchServe is the DLC default inference server for PyTorch models. Triton is also supported for PyTorch inference on SageMaker.
Anyone has a good comparison matrix for both?
Important notes to add here where both serving stacks differ:
TorchServe does not provide the Instance Groups feature that Triton does (that is, stacking many copies of the same model or even different models onto the same GPU). This is a major advantage for both realtime and batch use-cases, as the performance increase is almost proportional to the model replication count (i.e. 2 copies of the model get you almost twice the throughput and half the latency; check out a BERT benchmark of this here). Hard to match a feature that is almost like having 2+ GPU's for the price of one.
if you are deploying PyTorch DL models, odds are you often want to accelerate them with GPU's. TensorRT (TRT) is a compiler developed by NVIDIA that automatically quantizes and optimizes your model graph, which represents another huge speed up, depending on GPU architecture and model. It is understandably so probably the best way of automatically optimizing your model to run efficiently on GPU's and make good use of TensorCores. Triton has native integration to run TensorRT engines as they're called (even automatically converting your model to a TRT engine via config file), while TorchServe does not (even though you can use TRT engines with it).
There is more parity between both when it comes to other important serving features: both have dynamic batching support, you can define inference DAG's with both (not sure if the latter works with TorchServe on SageMaker without a big hassle), and both support custom code/handlers instead of just being able to serve a model's forward function.
Finally, MME on GPU (coming shortly) will be based on Triton, which is a valid argument for customers to get familiar with it so that they can quickly leverage this new feature for cost-optimization.
Bottom line I think that Triton is just as easy (if not easier) ot use, a lot more optimized/integrated for taking full advantage of the underlying hardware (and will be updated to keep being that way as newer GPU architectures are released, enabling an easy move to them), and in general blows TorchServe out of the water performance-wise when its optimization features are used in combination.
Because I don't have enough reputation for replying in comments, I write in answer.
MME is Multi-model endpoints. MME enables sharing GPU instances behind an endpoint across multiple models and dynamically loads and unloads models based on the incoming traffic.
You can read it further in this link
Is it possible to manually define the logic of the serialization used for AppEngine Datastore?
I am assuming Google is using reflection to do this in a generic way. This works but proves to be quite slow. I'd be willing to write (and maintain) quite some code to speed up the serialization / deserialization of datastore objects (I have large objects and this consumes quite some percentage of the time).
The datastore uses Protocol-Buffers internally, and there is no way round, as its the only way your application can communicate with the datastore.
(The implementation can be found in the SDK/google/appengine/datastore/entity_pb.py)
If you think, (de)serialization is too slow in your case, you probably have two choices
Move to a lower DB API. There's another API next to the two well-documented ext.db and ext.ndb APIs at google.appengine.datastore. This hasn't all the fancy model-stuff and provides a simple (and hopefully fast) dictionary-like api. This will keep your datastore-layout compatible with the other two DB APIs.
Serialize the object yourself, and store it in a dummy entry consisting just of a text-field. But you'll probably need to duplicate data into your base entry, as you cannot filter/sort by data inside your self-serialized text.
I am writing an application, which parses a large file, generates a large amount of data and do some complex visualization with it. Since all this data can't be kept in memory, I did some research and I'm starting to consider embedded databases as a temporary container for this data.
My question is: is this a traditional way of solving this problem? And is an embedded database (other than structuring data) supposed to manage data by keeping in memory only a subset (like a cache), while the rest is kept on disk? Thank you.
Edit: to clarify: I am writing a desktop application. The application will be inputted with a file of size of 100s of Mb. After reading the file, the application will generate a large number of graphs which will be visualized. Since, the graphs may have such a large number of nodes, they may not fit into memory. Should I save them into an embedded database which will take care of keeping only the relevant data in memory? (Do embedded databases do that?), or I should write my own sophisticated module which does that?
Tough question - but I'll share my experience and let you decide if it helps.
If you need to retain the output from processing the source file, and you use that to produce multiple views of the derived data, then you might consider using an embedded database. The reasons to use an embedded database (IMHO):
To take advantage of RDBMS features (ACID, relationships, foreign keys, constraints, triggers, aggregation...)
To make it easier to export the data in a flexible manner
To enable access to your processed data to external clients (known format)
To allow more flexible transformation of the data when preparing for viewing
Factors which you should consider when making the decision:
What is the target platform(s) (windows, linux, android, iPhone, PDA)?
What technology base? (Java, .Net, C, C++, ...)
What resource constraints are expected or need to be designed for? (RAM, CPU, HD space)
What operational behaviours do you need to take into account (connected to network, disconnected)?
On the typical modern desktop there is enough spare capacity to handle most operations. On eeePCs, PDAs, and other portable devices, maybe not. On embedded devices, very likely not. The language you use may have build in features to help with memory management - maybe you can take advantage of those. The connectivity aspect (stateful / stateless / etc.) may impact how much you really need to keep in memory at any given point.
If you are dealing with really big files, then you might consider a streaming process approach so you only have in memory a small portion of the overall data at a time - but that doesn't really mean you should (or shouldn't) use an embedded database. Straight text or binary files could work just as well (record based, column based, line based... whatever).
Some databases will allow you more effective ways to interact with the data once it is stored - it depends on the engine. I find that if you have a lot of aggregation required in your base files (by which I mean the files you generate initially from the original source) then an RDBMS engine can be very helpful to simplify your logic. Other options include building your base transform and then adding additional steps to process that into other temporary stores for each specific view, which are then in turn processed for rendering to the target (report?) format.
Just a stream-of-consciousness response - hope that helps a little.
Edit:
Per your further clarification, I'm not sure an embedded database is the direction you want to take. You either need to make some sort of simplifying assumptions for rendering your graphs or investigate methods like segmentation (render sections of the graph and then cache the output before rendering the next section).
What's the best way to store large JSON files in a database? I know about CouchDB, but I'm pretty sure that won't support files of the size I'll be using.
I'm reluctant to just read them off of disk, because of the time required to read and then update them. The file is an array of ~30,000 elements, so I think storing each element separately in a traditional database would kill me when I try to select them all.
I have lots of documents in CouchDB that exceed 2megs and it handles them fine. Those limits are outdated.
The only caveat is that the default javascript view server has a pretty slow JSON parser so view generation can take a while with large documents. You can use my Python view server with a C based JSON library (jsonlib2, simplejson, yajl) or use the builtin erlang views which don't even hit JSON serialization and view generation will be plenty fast.
If you intend to access specific elements one (or several) at a time, there's no way around breaking the big JSON into traditional DB rows and columns.
If you'd like to access it in one shot, you can convert it to XML and store that in the DB (maybe even compressed - XMLs are highly compressible). Most DB engines support storing an XML object. You can then read it in one shot, and if needed, translate back to JSON, using forward-read approaches like SAX, or any other efficient XML-reading technology.
But as #therefromhere commented, you could always save it as one big string (I would again check if compressing it enhances anything).
You don't really have a variety of choices here, you can cache them in RAM using something like memcached or push them to disk reading and writing them with a databsae (RDBMS like PostgreSQL/MySQL or DOD like CouchDB). The only real alternative to these is a hybrid system of caching the most frequently accessed documents in memcached for reading which is how a lot of sites operate.
2+MB isn't a massive deal to a database and providing you have plenty of RAM they will do an intelligent enough job of caching and using your RAM effectively. Do you have a frequency pattern of when and how often these documents are accessed and how man users you have to serve?
I have a silverlight client accessing data through ado.net data services. One of my queries has a number of expand clauses, and gets back quite a number of entries. The xml response is enormous, and I'm looking for ways to make this more efficient.
I have tried:
Paging (not an option for this behaviour)
Http compression (some client pcs are running IE6)
Doing the expands as separate queries and joining the entities later (this improved things a little)
Is it possible to use JSON as a transport format with the silverlight client? I haven't found anything about this on the web...
You can see the demonstration of using JSON in silverlight in the below link
http://timheuer.com/blog/archive/2008/05/06/use-json-data-in-silverlight.aspx
I am not sure how much performance gain is achieved by using JSON. I definitely remember that ado.net services does JSON.
Well. I got a chance to talk to Tim Heuer about this, who awesomely went and asked Pablo Castro for me. Thanks Tim!
JSON can't be used by the Silverlight client, but Silverlight 3 will be using binary xml by default to talk to web services. Rawr.
One other thing i worked out for myself was that using expand can sometimes result in a lot more data than performing multiple requests. If you batch a few queries together and then hand-stitch the objects together, you can save quite a bit of xml.