Why is the Trust Region Policy Optimization a On-policy algorithm? - artificial-intelligence

I'm wondering why is the Trust Region Policy Optimization a On-policy algorithm?
In my opinion, in TRPO, we samples by the old policy and update the new policy and apply the importance sampling to correct the bias. Thus, it is more like a off-policy algorithm.
But recently, I read a paper which said:
In contrast to off-policy algorithms, on-policy methods require
updating function approximatorsaccording to the currently followed
policy. In particular, we will consider Trust Region
PolicyOptimization, an extension of traditional policy gradient
methods using the natural gradient direction.
Does any point I misunderstand?

The key feature of on-policy methods is that they must use the estimated policy in order to interact with the environment. In the case of Trust Region Policy Optimization, effectively it adquires samples (i.e., interact with the environment) using the current policy, then updates the policy and uses the new policy estimation in the next iteration.
So, the algorithm is using the estimated policy during the learning process, which is the definition of on-policy methods.

Related

Consensus algorithm check list

I wrote a new consensus algorithm. Is there a self-evaluation checklist I can run to see if it meets the basic requirements? Like is it resistant to double-spent attacks? Or how does it scales?
I reviewed this entire algorithm. Though the idea is great, it feels a bit incomplete. The self-evaluation checklist below is based on the requirements and safety measures taken by well established blockchains i.e. ETH, BTC, etc.
System Criteria:
What is the required storage capacity of system?
-- RAM usage, bandwidth
What happens when the entire network goes offline?
Algorithm evaluation:
Is this algorithm scalable? scalable as in operable when the users multiply exponentially.
How long does it take the miners to reach a 2/3 consensus?
Are there safety measures for the user' funds?
How can a user transfer funds safely? (Cryptographic hash algorithms that can be deciphered only by the authorised entities to ensure safety)
Architecture evaluation:
Is it decentralized, transparent and immutable?
User evaluation:
Is there enough incentive for a miner/validator to validate the transactions?
Is there enough incentive for a "new" miner/validator to join the network?
Is it possible for a single entity to dominate the network?
What safety measure is there in order to prevent blind/unreliable transfer of data?
Resistance to attacks evaluation:
Is the algorithm resistant to double spend attacks, eclipse attacks, sybil attacks(identity thefts), 67% attack?
Is there a way for the honest users to defend against such attacks? If not then how likely is it that an attacker is successful after attacking the blockchain?
As an attacker what is the weakness of this algorithm? Once something is confirmed by 2/3 then it is unchangeable, so how can you get that 2/3 vote ?
These are some conditions that came to my mind while reading the algorithm description which were unanswered. A consensus algorithm takes into account the maximum throughput and latency of the current systems in order to provide a holistic idea of how to evade attacks and secure the users. If the consensus algorithm fails to do either of those then it would not fly in the market because the network would become untrustworthy because of a lacking algorithm. To ensure it is not, such questions should be asked in addition to the blockchain/algorithm specific questions that would rise in a user's mind when trying to join a network. At the end of the day, everyone likes to keep their money safe and secure and hidden away from the general public to avoid any and all kinds of attack.
I'll admit I didn't read it too carefully - but I was looking on how the document hands CAP theorem.
There is a statement in your doc: "since they (validators) are looking at the full blockchain picture". This statement is never true in a distributed system.
Second statement "Once 2/3 of the validators approve an item" - who does this decision that 2/3 reached? When does the customer knows that the transaction is good? It seems the system is not too stable and will come to halt quite often.
Looking forward for other comments from the community :)

NVIDIA Triton vs TorchServe for SageMaker Inference

NVIDIA Triton vs TorchServe for SageMaker inference? When to recommend each?
Both are modern, production grade inference servers. TorchServe is the DLC default inference server for PyTorch models. Triton is also supported for PyTorch inference on SageMaker.
Anyone has a good comparison matrix for both?
Important notes to add here where both serving stacks differ:
TorchServe does not provide the Instance Groups feature that Triton does (that is, stacking many copies of the same model or even different models onto the same GPU). This is a major advantage for both realtime and batch use-cases, as the performance increase is almost proportional to the model replication count (i.e. 2 copies of the model get you almost twice the throughput and half the latency; check out a BERT benchmark of this here). Hard to match a feature that is almost like having 2+ GPU's for the price of one.
if you are deploying PyTorch DL models, odds are you often want to accelerate them with GPU's. TensorRT (TRT) is a compiler developed by NVIDIA that automatically quantizes and optimizes your model graph, which represents another huge speed up, depending on GPU architecture and model. It is understandably so probably the best way of automatically optimizing your model to run efficiently on GPU's and make good use of TensorCores. Triton has native integration to run TensorRT engines as they're called (even automatically converting your model to a TRT engine via config file), while TorchServe does not (even though you can use TRT engines with it).
There is more parity between both when it comes to other important serving features: both have dynamic batching support, you can define inference DAG's with both (not sure if the latter works with TorchServe on SageMaker without a big hassle), and both support custom code/handlers instead of just being able to serve a model's forward function.
Finally, MME on GPU (coming shortly) will be based on Triton, which is a valid argument for customers to get familiar with it so that they can quickly leverage this new feature for cost-optimization.
Bottom line I think that Triton is just as easy (if not easier) ot use, a lot more optimized/integrated for taking full advantage of the underlying hardware (and will be updated to keep being that way as newer GPU architectures are released, enabling an easy move to them), and in general blows TorchServe out of the water performance-wise when its optimization features are used in combination.
Because I don't have enough reputation for replying in comments, I write in answer.
MME is Multi-model endpoints. MME enables sharing GPU instances behind an endpoint across multiple models and dynamically loads and unloads models based on the incoming traffic.
You can read it further in this link

How to implement a safety-critical AI compute cluster at the edge?

I want to experiment to develop a redundant autonomous car compute architecture which can handle all AI and other computing stuff. To do that, I bought some edge computing devices (Nvidia Jetson TX2s) which contains integrated GPU. Then I connected them with a gigabit ethernet switch so now I can communicate them.
I need your advices for the system architecture. How can I implement this failsafe, safety-critcal and redundant system? There is some cluster examples to provide high avaibility. But I want to do that : "Each compute node runs same processes then they output results to master node. Master node analyses and votes the results and picks the best one. If a compute node fails (bug, system down, lack of electiricty etc), the system should be aware of failure and transfer the failed node's compute load to healty nodes. Also each node should run some node specific tasks without affected by cooperated processes."
What is your thoughts? Any keyword, suggestion, method recommodation helps me.
The primary system/software safety standard for automobiles is ISO 26262. If you're going to be serious about making an automotive product, you'll want to acquire a copy and follow the process.
The primary classification for levels of autonomy in cars is SAE J3016_201806. You'll save a lot of headache up front by knowing which level you're shooting for beforehand. You may want to shoot for Level 1 ("hands on") like an adaptive cruise control or lane departure prevention system before trying to do more sophisticated things.
Here are some general themes that I've gleaned from doing safety stuff:
There is no generally-accepted way to determine a probability of software failure. There's even a school of thought that software does not fail. Instead, most safety standards assign safety-significant functionality implemented in software to different "levels" that require higher levels of scrutiny based on certain criteria including severity, closeness to a hazard (are there interlocks?), etc.
Most safety standards define software as everything running on the hardware, so you will need to ensure that the operating system you use also can meet the standards. This usually means a real-time operating system.
Keep your safety-significant functionality as simple as possible. If you can do something with elementary electrical circuits and logic gates (such as an emergency stop), do it because the math and analysis is much more mature for hardware.
Acquire and follow a safety-relevant coding standard. The predominant one for automotive applications is MISRA C.
Look into using fault tree analysis to identify the relationships of failures required for a mishap to occur. This also helps identify single points of failure.
Try to alleviate hazards in the design if possible. Procedural mitigations and personal protective equipment should be a last resort.
At a minimum, you'll want a hard electrical emergency stop for the safety driver and a remote-controlled emergency stop operated by a spotter.

A/B Test feature in SageMaker: variant assignment is random?

A/B test feature in SageMaker sounds so intriguing but the more I looked into, the more I am confused whether this is a useful feature. For this to be useful, you need to get the variant assignment data back and join with some internal data to figure out the best performing variant.
How is this assignment done? Is it purely random? Or am I supposed to pass some kind of ID (or hashed ID) which can indicate a person or a browser so that the same model is picked up for the same person.
For this to be useful, you need to get the variant assignment data back and join with some internal data to figure out the best performing variant.
The InvokeEndpoint response includes the "InvokedProductionVariant", in order to support the kind of analysis you describe. Details can be found in the API documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/API_runtime_InvokeEndpoint.html#API_runtime_InvokeEndpoint_ResponseSyntax
How is this assignment done? Is it purely random?
Traffic is distributed randomly while remaining proportional to the weight of the production variant.
so that the same model is picked up for the same person
Amazon SageMaker does not currently support this type of functionality, which is a major blocker for using it on some A/B tests.
I created a thread in the AWS SageMaker forum asking for this functionality to be added: https://forums.aws.amazon.com/thread.jspa?threadID=290644&tstart=0

Examples for Topological Sorting on Large DAGs

I am looking for real world applications where topological sorting is performed on large graph sizes.
Some fields where I image you could find such instances would be bioinformatics, dependency resolution, databases, hardware design, data warehousing... but I hope some of you may have encountered or heard of any specific algorithms/projects/applications/datasets that require topsort.
Even if the data/project may not be publicly accessible any hints (and estimates on the order of magnitude of potential graph sizes) might be helpful.
Here are some examples I've seen so far for Topological Sorting:
While scheduling task graphs in a distributed system, it is usually
needed to sort the tasks topologically and then assign them to
resources. I am aware of task graphs containing more than 100,000
tasks to be sorted in a topological order. See this in this context.
Once upon a time I was working on a Document Management System. Each
document on this system has some kind of precedence constraint to a
set of other documents, e.g. its content type or field referencing.
Then, the system should be able to generate an order of the documents
with the preserved topological order. As I can remember, there were
around 5,000,000 documents available two years ago !!!
In the field of social networking, there is famous query to know the
largest friendship distance in the network. This problem needs to
traverse the graph by a BFS approach, equal to the cost of a
topological sorting. Consider the members of Facebook and find your
answer.
If you need more real examples, do not hesitate to ask me. I have worked in lots of projects working on on large graphs.
P.S. for large DAG datasets, you may take a look at Stanford Large Network Dataset Collection and Graphics# Illinois page.
I'm not sure if this fits what you're looking for but did you know Bio4j project?
Not all the contents stored in the graph based DB would be adequate for topological sorting (there exists directed cycles in an important part of the graph), however there are sub-graphs like Gene Ontology and Taxonomy where this ordering may have sense.
TopoR is a commercial topological PCB router that works first by routing the PCB as topological problem and then translating the topology into physical space. They support up to 32 electrical layers, so it should be capable of many thousands of connections (say 10^4).
I suspect integrated circuits may use similar methods.
The company where I work manages a (proprietary) database of software vulnerabilities and patches. Patches are typically issued by a software vendor (like Microsoft, Adobe, etc.) at regular intervals, and "new and improved" patches "supercede" older ones, in the sense that if you apply the newer patch to a host then the old patch is no longer needed.
This gives rise to a DAG where each software patch is a node with arcs pointing to a node for each "superceding" patch. There are currently close to 10K nodes in the graph, and new patches are added every week.
Topological sorting is useful in this context to verify that the graph contains no cycles - if they do arise then it means that there was either an error in the addition of a new DB record, or corruption was introduced by botched data replication between DB instances.

Resources