grab() function in seq and seqr in UVM - uvm

The grab function could be found both in uvm_sequence_base and uvm_sequencer_base, the explanation make me so confused about the lock and grab.
I think the sequence is like the flow water, and the sequencer is the valve ,only sequencer could be blocked or opened .
Am I right? please say anything about the grab() both in sequence and sequencer.

I have tried using both for various situations, here's my observationstrong text
Let's assume 4 streams of sequences waiting for the sequencer to grant access
S1,S2,S3,S4 (each containing 10 subsequences)
1)If grab/ungrab is used for only ONE of the sequences, say S2. Regardless of how arbitration is set, S2 will get access first. All the subsequences of S2 will be passed followed by other ones. Other sequences will be executed depending on the arbitration setting.
2)If lock/unlock is used for only ONE of the sequences, say S3. It will wait for its turn for the sequencer access, and all the subsequences will be executed one after the other.
Order of arrival-> S1,S2,S3,S4
assuming arbitration is SEQ_ARB_FIFO (Default)
execution -> S1.seq1 -> S2.seq1 -> S3.seq1 -> S3.seq2 -> .......-> S3.seq10 -> S4.seq1 -> S1.seq2 -> S2.seq2 -> ......->S4.seq10
Multiple lock/grab will be given priority and the queued ones will be given access in the FIFO order regardless of the arbitration settings
eg1: Order of arrival-> S1,S2,S3,S4
S1: lock/unlock
S3: grab/ungrab
assuming arbitration is SEQ_ARB_FIFO (Default)
execution -> S1.seq1 -> S1.seq2 -> ....S3.seq1 -> S3.seq2 -> .......-> S3.seq10 -> S2.seq1 -> S4.seq1 -> S2.seq2 -> ......->S4.seq10
eg2: Order of arrival-> S1,S2,S3,S4
S1: lock/unlock
S3: lock/unlock
assuming arbitration is SEQ_ARB_FIFO (Default)
execution -> S1.seq1 -> S1.seq2 -> ....S3.seq1 -> S3.seq2 -> .......-> S3.seq10 -> S2.seq1 -> S4.seq1 -> S2.seq2 -> ......->S4.seq10

grab() and lock() are very similar. The only difference is that a grab() request is put at the front of the sequencer arbitration queue, and a lock() request is put at the back of sequencer arbitration queue.
This blog has one of the best explanations I have found about how to use the UVM sequencer built-in grab and lock functions:
http://sagar5258.blogspot.com/2017/02/

Related

How to reward agent's action in self-play adversarial game reinforcement learning?

I am beginner in this field and I am trying to implement an agent playing an adversarial game like chess. I create two agents that share the same neural network and experience buffer. In every step, the neural network will be updated by both agents (features order swapped).
Does my self-play approach make sense? And if it does, how will I reward the agent's behaviors?
More clearly, following this:
(0) state -> (1) agent0 action -> (2) reward -> (3) state -> (4) agent1 action -> (5) reward -> (6) state
Is the next state of agent0 after (1) would be (3) or (6)? And is the corresponding reward (2) or (5) or something else (for example (2) - (5))?
Here is my understanding. You start with two randomized initialized networks. Please don't share the network. Then you train one of them and only that one. When the network under training begins to show some progress, you stop this match and you make a copy of the good network to be the new opponent. Then you continue. This way you bootstrap yourself and the opponents. You don't try to learn against an expert, but rather against someone who is more or less equal to you.
Above I've described saving only one copy of a previous you, but you can of course keep the last 10 instances and play against all of those.

BLink Tree insert operation

The B-link tree introduced in Lehman and Yao (1981) claims that any insert operation need at most 3 locks simultaneously.
I have a hard time to find a concrete scenario where 3 locks are acquired. What is the scenario?
The scenario appears when:
Split leaf. A leaf node (originally A) is split (into A1 and A2), hence the split key (max(A1)) needs to be inserted into the parent node (T).
Parent node has link. The parent node T also has a valid link pointer that points to S. The value has to be inserted into S instead of T.
The three locks are:
On leaf node A1: to prevent further split of this node (and also nodes that its link points to)
On T: when Move.right is performed (see below).
On S: when Move.right is performed (see below).
[Move.right]
while True:
S = scannode(v, T)
if isLinkPointer(S):
lock(S) # <-- 3 locks *
unlock(T) # <-- 2 locks
T = S
Lock 2 and 3 are more like "transitions locks" that has to be acquired when moving right. Therefore the 3 lock scenario really is just a tiny tiny amount of time.
A Crude Graphic Illustration.

Starvation of one of 2 streams in ConnectedStreams

Background
We have 2 streams, let's call them A and B.
They produce elements a and b respectively.
Stream A produces elements at a slow rate (one every minute).
Stream B receives a single element once every 2 weeks. It uses a flatMap function which receives this element and generates ~2 million b elements in a loop:
(Java)
for (BElement value : valuesList) {
out.collect(updatedTileMapVersion);
}
The valueList here contains ~2 million b elements
We connect those streams (A and B) using connect, key by some key and perform another flatMap on the connected stream:
streamA.connect(streamB).keyBy(AClass::someKey, BClass::someKey).flatMap(processConnectedStreams)
Each of the b elements has a different key, meaning there are ~2 million keys coming from the B stream.
The Problem
What we see is starvation. Even though there are a elements ready to be processed they are not processed in the processConnectedStreams.
Our tries to solve the issue
We tried to throttle stream B to 10 elements in a 1 second by performing a Thread.sleep() every 10 elements:
long totalSent = 0;
for (BElement value : valuesList) {
totalSent++;
out.collect(updatedTileMapVersion);
if (totalSent % 10 == 0) {
Thread.sleep(1000)
}
}
The processConnectedStreams is simulated to take 1 second with another Thread.sleep() and we have tried it with:
* Setting parallelism of 10 to all the pipeline - didn't work
* Setting parallelism of 15 to all the pipeline - did work
The question
We don't want to use all these resources since stream B is activated very rarely and for stream A elements having high parallelism is an overkill.
Is it possible to solve it without setting the parallelism to more than the number of b elements we send every second?
It would be useful if you shared the complete workflow topology. For example, you don't mention doing any keying or random partitioning of the data. If that's really the case, then Flink is going to pipeline multiple operations in one task, which can (depending on the topology) lead to the problem you're seeing.
If that's the case, then forcing partitioning prior to the processConnectedStreams can help, as then that operation will be reading from network buffers.

Using JGraphT to Manage Ordering of Dependent Tasks

I have a list of tasks that have dependencies between them and I was considering how I could use JGraphT to manage the ordering of the tasks. I would set up the graph as a directed graph and remove vertices as I processed them (or should I mask them?). I could use TopologicalOrderIterator if I were only going to execute one task at a time but I'm hoping to parallelize the tasks. I could get TopologicalOrderIterator and check Graphs.vertexHasPredecessors until I find as many as I want to execute as once but ideally, there would be something like Graphs.getVerticesWithNoPredecessors. I see that Netflix provides a utility to get leaf vertices, so I could reverse the graph and use that, but it's probably not worth it. Can anyone point me to a better way? Thanks!
A topological order may not necessary be what you want. Here's an example why not. Given the following topological ordering of tasks: [1,2,3,4], and the arcs (1,3), (2,3). That is, task 1 needs to be completed before task 3, similar for 2 and 4. Let's also assume that task 1 takes a really long time to complete. So we can start processing tasks 1, and 2 in parallel, but you cannot start 3 before 1 completes. Even though task 2 completes, we cannot start task 4 because task 3 is the next task in our ordering and this task is being blocked by 1.
Here's what you could do. Create an array dep[] which tracks the number of unfulfilled dependencies per task. So dep[i]==0 means that all dependencies for task i have been fulfilled, meaning that we can now perform task i. If dep[i]>0, we cannot perform task i yet. Lets assume that there is a task j which needs to be performed prior to task i. As soon as we complete task j, we can decrement the number of unfulfilled dependencies of task i, i.e: dep[i]=dep[i]-1. Again, if dep[i]==0, we are now ready to process task i.
So in short, the algorithm in pseudocode would look like this:
Initialize dep[] array.
Start processing in parallel all tasks i with dep[i]==0
if a task i completes, decrement dep[j] for all tasks j which depend on i. If task j has dep[j]==0, start processing it.
You could certainly use a Directed Graph to model the dependencies. Each time you complete a task, you could simply iterate over the outgoing neighbors (in jgrapht use the successorsOf(vertex) function). The DAG can also simply be used to check feasibility: if the graph contains a cycle, you have a problem in your dependencies. However, if you don't need this heavy machinery, I would simply create a 2-dimensional array where for each task i you store the tasks that are dependent on i.
The resulting algorithm runs in O(n+m) time, where n is the number of tasks and m the number of arcs (dependencies). So this is very efficient.

Distributed Mutual Exclusion: Coterie Formation

I have been studying distributed mutual exclusion algorithms based on the concept of Quorums.
Quoting:
A Coterie C is defined as a set of sets, where each set g ∈ C is called a quorum.
The following properties hold for quorums in a coterie:
1) Intersection property: For every quorum g, h ∈ C, g ∩ h= ∅.
For example, sets {1,2,3}, {2,5,7} and {5,7,9} cannot be quorums in a
coterie because the first and third sets do not have a common element.
2) Minimality property: There should be no quorums g, h in coterie C such
that g ⊇ h. For example, sets {1,2,3} and {1,3} cannot be quorums in a
coterie because the first set is a superset of the second.
I would like to know that, given a set of nodes in a distributed system, how are such coteries or set of quorums formed from such nodes?
What are the algorithms or techniques to do this?
UPDATE:
To put the problem in other words -
"Given 'N' nodes, what is the best way to form 'K' quorums such that any two of them have 'J' number of nodes in common?"
Simple algorithms for reading or writing would be, that you have to read from every node in a quorum and write to every node in a quorum. This way you can be sure that every other party in the system will read the latest written item.
Since your title is about mutual exclusions, A peer in the system can ask every node in a quorum for a lock to a resource. Due to the 1st rule, no other peer can get the lock from the whole quorum.
As far as I know you contact in practice random nodes and use as a quorum n/2 + 1 but as you can see, you can also define more sophisticated distributions which allow you to have smaller quorums, which again improves the performance.
Update:
Examples for such quorums with 9 servers could be the following:
2 quorums: servers 1-5 are one quorum and 5-9 would be another (simple majority)
3 quorums: servers 1,2,3,4; 4,5,6,7; and 7,8,9,1 could be 3 different quorums
more quorums: servers 1,2,3; 3,4,5; 5,6,1; 6,7,3; 8,3,1; 9,3,1; could be 6 different quorums. However here you can see that server 1 and 3 are part of 4 quorums each and will need to handle much more traffic for this reason.
you could potentially also create quorums like 1,2; 1,3; 1,4; 1,5; 1,6; 1,7; 1,8; 1,9; But this is the same as just having server 1.

Resources