The B-link tree introduced in Lehman and Yao (1981) claims that any insert operation need at most 3 locks simultaneously.
I have a hard time to find a concrete scenario where 3 locks are acquired. What is the scenario?
The scenario appears when:
Split leaf. A leaf node (originally A) is split (into A1 and A2), hence the split key (max(A1)) needs to be inserted into the parent node (T).
Parent node has link. The parent node T also has a valid link pointer that points to S. The value has to be inserted into S instead of T.
The three locks are:
On leaf node A1: to prevent further split of this node (and also nodes that its link points to)
On T: when Move.right is performed (see below).
On S: when Move.right is performed (see below).
[Move.right]
while True:
S = scannode(v, T)
if isLinkPointer(S):
lock(S) # <-- 3 locks *
unlock(T) # <-- 2 locks
T = S
Lock 2 and 3 are more like "transitions locks" that has to be acquired when moving right. Therefore the 3 lock scenario really is just a tiny tiny amount of time.
A Crude Graphic Illustration.
I'm trying to figure out what's the proper way to make kind of a barrier on merging multiple streams in Flink.
So let's say I have 4 keyed streams each calculating some aggregated statistics over batches of data. Next I want to combine results of these 4 streams into one stream (Y) and perform some additional computation on received 4 summaries.
The problem is how to make Y node wait until it received all the summaries with X=N before going forward with X=N+1.
In the picture node 3 sent its summary X=N later than node 4 sent its X=N+1
so node Y must wait until it has received node 3 summary while caching summaries with X=N+1 from other nodes somehow.
I couldn't find anything similar in documentation so I'd really appreciate any hints.
I figured out this task can be solved by simply doing the following:
.keyBy(X)
.countWindow(4)
.fold(...)
We have two osb nodes in cluster. One of node osb1 has less ovearall response time ( 1 sec) when measured in appdynamics, another node osb2 has high response(20sec). We brought down each of this node and tested individually. We see same behavior. Any suggestions on what to look into to identify the issue.? The osb configuration across both the nodes Is identical and jvm configuration also identical. Heap usage is same. CPU bit differs.
I want one primary collection of items of a single type that modifications are made to over time. Periodically, several slave collections are going to synchronize with the primary collection. The primary collection should send a delta of items to the slave collections.
Primary Collection: A, C, D
Slave Collection 1: A, C (add D)
Slave Collection 2: A, B (add C, D; remove B)
The slave collections cannot add or remove items on their own, and they may exist in a different process, so I'm probably going to use pipes to push the data.
I don't want to push more data than necessary since the collection may become quite large.
What kind of data structures and strategies would be ideal for this?
For that I use differential execution.
(BTW, the word "slave" is uncomfortable for some people, with reason.)
For each remote site, there is a sequential file at the primary site representing what exists on the remote site.
There is a procedure at the primary site that walks through the primary collection, and as it walks it reads the corresponding file, detecting differences between what currently exists on the remote site and what should exist.
Those differences produce deltas, which are transmitted to the remote site.
At the same time, the procedure writes a new file representing what will exist at the remote site after the deltas are processed.
The advantage of this is it does not depend on detecting change events in the primary collection, because often those change events are unreliable or can be self-cancelling or made irrelevant by other changes, so you cut way down on needless transmissions to the remote site.
In the case that the collections are simple lists of things, this boils down to having local copies of the remote collections and running a diff algorithm to get the delta.
Here are a couple such algorithms:
If the collections can be sorted (like your A,B,C example), just run a merge loop:
while(ix<nx && iy<ny){
if (X[ix] < Y[iy]){
// X[ix] was inserted in X
ix++;
} else if (Y[iy] < X[ix]){
// Y[iy] was deleted from X
iy++;
} else {
// the two elements are equal. skip them both;
ix++; iy++;
}
}
while(ix<nx){
// X[ix] was inserted in X
ix++;
}
while(iy<ny>){
// Y[iy] was deleted from X
iy++;
}
If the collections cannot be sorted (note relationship to Levenshtein distance),
Until we have read through both collections X and Y,
See if the current items are equal
else see if a single item was inserted in X
else see if a single item was deleted from X
else see if 2 items were inserted in X
else see if a single item was replaced in X
else see if 2 items were deleted from X
else see if 3 items were inserted in X
else see if 2 items in X replaced 1 items in Y
else see if 1 items in X replaced 2 items in Y
else see if 3 items were deleted from X
etc. etc. up to some limit
Performance is generally not an issue, because the procedure does not have to be run at high frequency.
There's a crude video demonstrating this concept, and source code where it is used for dynamically changing user interfaces.
If one doesn't push all data, sort of a log is required, which, instead of using pipe bandwidth, uses main memory. The parameter to find a good balance between CPU & memory usage would be the 'push' frequency.
From your question, I assume, you have more than one slave process. In this case, some shared memory or CMA (Linux) approach with double buffering in the master process should outperform multiple pipes by far, as it doesn't even require multithreaded pushing, which would be used to optimize the overall pipe throughput during synchronization.
The slave processes could be notified using a global synchronization barrier for reading from masterCollectionA without copying, while master modifies masterCollectionB (which is initialized with a copy from masterCollectionA) and vice versa. Access to a collection should be interlocked between slaves and master. The slaves could copy that collection (snapshot), if they would block it past the next update attempt from master, thus, allowing it to continue. Modifications in slave processes could be implemented with a copy on write strategy for single elements. This cooperative approach is rather simple to implement and in case the slave processes don't copy whole snapshots everytime, the overall memory consumption is low.
I have 200 groups. Each group has 100 devices, i.e. a total of 20000 devices divided into 200 groups of 100 each.
Now when each device gets registered with the server, the server assigns a group id to the device. (100 devices has same group id.) At a later stage the server sends the multicast data with the group id so that the data is received to all the devices having that group id.
The problem is that I need to allocate a single chunk of memory(say 25bytes) for each group to store the data so that all the devices in that group will use that chunk for their processing. My idea is to allocate a big chunk (say 25 * 200 = 5000 bytes) and assign each group a 25 byte block (grp0 points to start address, grp1 points to start+25 address and so on).
Is this the best way? Any other ideas?
For your example, I would use an array.
Provided the number of your clients does not change, allocating a single block is the most efficient way:
You do a single malloc call instead of 100
you avoid the overhead associate with the list that will track every memory block allocation
your data is kept in one piece, which it makes it more easily cacheable by the processor cache, compared to 100 small blocks placed god-knows-where
Said that, probably the difference with just 100 elements is negligible, but multiplied by 200 groups can give you a performace boost (really depend on how you are using the data structure)
In case of a dynamic structure instead (for example, your clients connect and disconnect so they are not always 100) you should use a linked list - which allocates the memory when needed (so you end up with 100 different memory blocks)
As stated by ArjunShankar, you will take O(1) time to ACCESS a device within a group, that's not bad assuming you don't have to process too much to find a specific device (assuming you have to find it). If you're planning to process them simultaneously and the number grows large (or your available memory is limited), you should take a look at some techniques such as disk pagination.