I am playing with Gecode to solve a problem where you label the arcs of a multi-digraph such that the sums of the arc labels in each path from a single source to a single sink come from a multiset. So for example we might have a graph with 8 paths and want path arc sums to come from the set {5,5,5,4,3,2,1,0}. So we must have exactly 3 paths with a sum of 5 and 5 paths with a unique sum of 0..4.
You can reformulate this problem as asking if some permutation of {5,5,5,4,3,2,1,0} is in the column space of the path arc incidence matrix of the graph.
I model this multiset match with the "count" constraint. The arc sums are linear equations.
My graphs have many parallel edges that come in pairs. Using symmetry I use this to impose a partial order on the path sums. This also means that there are sets of pairs of path sums that have the same difference. So from my example if the paths are b0...b7 I have the following pair sets:
b0 - b1 = b3 - b4 = b5 - b6, b0 - b2 = b5 - b7, b0 - b3 = b1 - b4, b0 - b5 = b1 - b6 = b2 - b7
b1 - b2 = b6 - b7, b3 - b5 = b4 - b6
Including these differences into the model seems to cut down the search space by two orders of magnitude in Gecode. I am pleased with this because I think it's telling me something important about the graphs I am studying and it fits with some conjectures in the area I work in.
The partial order now tells us that only b0,b2,b3,b5 and b7 may take the value 5.
It's possible to now prove this system can not be satisfied. I am interested in techniques from constraint satisfaction etc that can be used to analyse a system of inequations (!=) along with "count". Obviously Gecode can prove this by assigning values and failing. I am interested in general techniques to both learn about constraint satisfaction, help improve the model and maybe gain some understanding of the things I am investigating.
To see the problem is not soluble we can show that each of the 6 sets of pair difference systems can not have a difference of zero. If they did they would generate duplicates of the wrong values or too many 5's.
For example b0 - b1 = b3 - b4 = b5 - b6 would have b0 = b1 which is impossible since b1 can not be 5 and that's the only value that can have duplicates.
Or b0 - b2 = b5 - b7 would mean that b0 = b2 and b5 = b7 requiring 4 5's.
So we end up with a set of inequations on the paths whose sum could be 5:
b0 != b2, b0 != b3, b0 != b5, b2 != b7, b5 != b7
We can see that b0 != 5 since if it was we could only get 2 fives. From the remaining 4 values we are forced to be able to set at most two to 5 so the whole system is insoluble.
Related
Another try at a semi-complicated deployment diagram (content is deployment:ish).
Background: I add components one by one, it works (see on plantuml server here)
until I get to "aabb9" (i e a9)...
Problem: When I add aabb9 to be the target of an "up" arrow from aabb5, I expect aabb9 to be placed above aabb5, where there is space. Like this:
Instead, the diagram layout is almost completely redone by the engine, and seemingly, the previously defined relationships are no longer "respected". So the (bad) result becomes:
Notice how the first nodes now come rightmost, and my relationships specified to be going to the right from those two (aabb1 & aabb2), are no longer "respected"/drawn as entered. Here is that same drawing with the line uncommented, and the "bad"/undesired result.
So, below here is the code that works, but if you uncomment the last line, it goes bananas and "relayouts" the whole thing.
Any clues to this? It would be cool to be able to create these simple diagrams with text...
Thanks! /mawi
#startuml
skinparam ranksep 5
skinparam nodesep 5
rectangle "aabb1" {
node aabb1 as a1
node aabb2 as a2
}
a1 --[hidden]> a2
control "aabb3" as a3
database "aabb4" as a4
queue "aabb5" as a5
control "aabb6" as a6
control "aabb7" as a7
database "aabb8" as a8
control "aabb9" as a9
a1 -right-> a3: Range
a2 -right-> a3: 3D Models
a3 -down-> a4: Range & Models
a3 -> a5: product.\nupsert
a5 -down-> a6: product.\nupsert
a6 -> a5: product.\nprocessed
a5 -> a7: product.processed
a7 -> a8: Data
a7 -> a5: product.\nstored
'a5 -up-> a9: product.stored
#enduml
Ok, so I figured I'd continued experimenting a bit before giving up, and I found what works for this case, and a hypothesis, but no root cause...
What works?
So, if I add a relationship/flow from the new "free" node (i e "aabb9") and not to it, it works. So, if I do e g a9 -down-> a5: product.stored, it works as expected.
And once that is done, I can add the flow from the rest of the diagram to the new node, i e uncomment the last line.
So the key is that I need to do the down arrow first: To link from the new node to the rest of the diagram, before the up arrow line that was commented.
Then the engine will layout aabb9 correctly.
To achieve the exact result I was after, I can just reverse the arrowhead of that flow, so I get a9 <-down- a5: product.stored.
But it almost seems like a bug. :-)
Comments and thoughts welcome, hope this helps some poor plantuml explorer. :)
Edit: here's the code:
#startuml
skinparam ranksep 5
skinparam nodesep 5
rectangle "aabb1" {
node aabb1 as a1
node aabb2 as a2
}
a1 --[hidden]> a2
control "aabb3" as a3
database "aabb4" as a4
queue "aabb5" as a5
control "aabb6" as a6
control "aabb7" as a7
database "aabb8" as a8
control "aabb9" as a9
a1 -right-> a3: Range
a2 -right-> a3: 3D Models
a3 -down-> a4: Range & Models
a3 -> a5: product.\nupsert
a5 -down-> a6: product.\nupsert
a6 -> a5: product.\nprocessed
a5 -> a7: product.processed
a7 -> a8: Data
a7 -> a5: product.\nstored
a9 -down-> a5: product.stored
a5 -up-> a9: product.stored
#enduml
I have some nested models with foreign key relationships going 4 levels deep.
A <- B <- C <- D
A has a set of B models, which each have a set of C models, which each have a set of D models.
I'm iterating of the each model (4 layers of looping from A down to B). This is producing lots of DB hits.
I don't need to do any filtering at the DB fetch level, as I need all the data from all the DB tables, so I ideally I'd like to get all the data with ONE SQL query that hits the DB (if that's possible) and then somehow have the data organized/filtered into their correct sets for each model. i.e. it's all pre-fetched and structured ready for using the data (e.g. in a web dashboard).
There seems to be a lot of django related pre-fetch helpers and packages, but none of them seem to work the way I expect. e.g. django-auto-prefetch (which seems ideal).
Is this a common use case (I thought it would be)?
How can I construct the ORM to get all the data in one hit and then just use the bits I need.
NOTE: target system is raspberry pi class device (1GHz Arm processor) with eMMC storage (similar to SD card), and using SQLite as the DB backend.
NOTE: I'm also using this with django-polymorphic, which may or may not make a difference?
Thanks, Brendan.
Using one query would result in a huge amount of bandwidth, since the values for the columns of the A model will be repeated per B model per C model per D model.
Indeed, the response would look like:
a_col1 | a_col2 | b_col1 | b_col2 | c_col1 | d_col1
A1 A1 B1 B1 C1 D1
A1 A1 B1 B1 C1 D2
A1 A1 B1 B1 C1 D3
A1 A1 B1 B1 C2 D4
A1 A1 B1 B1 C2 D5
A1 A1 B1 B1 C2 D6
A1 A1 B2 B2 C3 D7
A1 A1 B2 B2 C3 D8
A1 A1 B2 B2 C3 D9
A1 A1 B2 B2 C4 D10
A1 A1 B2 B2 C4 D11
A1 A1 B2 B2 C4 D12
A2 A2 B3 B3 C5 D13
A2 A2 B3 B3 C5 D14
A2 A2 B3 B3 C5 D15
A2 A2 B3 B3 C5 D16
We thus would repeat the values for the a_columns, b_columns, etc. a large number of times resulting in a large amount of bandwidth going from the database to the Python/Django layer. This would not only result in large amounts of data being transferred, but also large amounts of memory being used by Django to deserialize this response.
Therefore .prefetch_related makes one (or two depending on the type of relation) extra queries per level at most, so three to seven queries in total, which will minimize the bandwidth.
You thus can fetch all objects in memory with:
for a in A.objects.prefetch_related('b_set', 'b_set__c_set', , 'b_set__c_set__d_set'):
print(a)
for b in a.b_set.all():
print(b)
for c in b.c_set.all():
print(c)
for d in c.d_set.all():
print(d)
I have two (1d) arrays, a long one A (size m) and a shorter one B (size n). I want to update the long array by adding each element of the short array at a particular index.
Schematically the arrays are structured like this,
A = [a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 ... am]
B = [ b1 b2 b3 b4 b5 b6 b7 b8 b9 ... bn ]
and I want to update A by adding the corresponding elements of B.
The most straightforward way is to have some index array indarray (same size as B) which tells us which index of A corresponds to B(i):
Option 1
do i = 1, size(B)
A(indarray(i)) = A(indarray(i)) + B(i)
end do
However, there is an organization to this problem which I feel should allow for some better performance:
There should be no barrier to doing this in vectorized way. I.e. the updates for each i are independent and can be done in any order.
There is no need to jump back and forth in array A. The machine should know to just loop once through the arrays only updating A where necessary.
There should be no need for any temporary arrays.
What is the best way to do this in Fortran?
Option 2
One way might be using PACK, UNPACK, and a boolean mask M (same size as A) that serves the same purpose as indarray:
A = [a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 ... am]
B = [ b1 b2 b3 b4 b5 b6 b7 b8 b9 ... bn ]
M = [. T T T . T T . . T . T T T T . ]
(where T represents .true. and . is .false.).
And the code would just be
A = UNPACK(PACK(A, M) + B, M, A)
This is very concise and maybe satisfies (1) and sort of (2) (seems to do two loops through the arrays instead of just one). But I fear the machine will be creating a few temporary arrays in the process which seems unnecessary.
Option 3
What about using where with UNPACK?
where (M)
A =A + UNPACK(B, M, 0.0d0)
end where
This seems about the same as option 2 (two loops and maybe creates temporary arrays). It also has to fill the M=.false. elements of the UNPACK'd array with 0's which seems like a total waste.
Option 4
In my situation the .true. elements of the mask will usually be in continuous blocks (i.e. a few true's in a row then a bunch of false's, then another block of true's, etc). Maybe this could lead to something similar to option 1. Let's say there's K of these .true. blocks. I would need an array indstart (of size K) giving the index into A of the start of each true block, and an index blocksize (size K) with the length of the true block.
j = 1
do i = 1, size(indstart)
i0 = indstart(i)
i1 = i0 + blocksize(i) - 1
A(i0:i1) = A(i0:i1) + B(j:j+blocksize(i)-1)
j = j + blocksize(i)
end do
At least this only does one loop through. This code seems more explicit about the fact that there's no jumping back and forth within the arrays. But I don't think the compiler will be able to figure that out (blocksize could have negative values for example). So this option probably won't result in a vectorized result.
--
Any thoughts on a nice way to do this? In my situation the arrays indarray, M, indstart, and blocksize would be created once but the adding operation must be done many times for different arrays A and B (though these arrays will have constant sizes). The where statement seems like it could be relevant.
I need to generate possible ranking of all possible ranking of n documents. I understand that the permutations of an array {1, 2,..., n} will give me the set of all possible rankings.
My problem is a bit more complex as each document could take one of 2 possible types. Therefore, in all there are n!*2n possible rankings.
For instance, let us say I have 3 documents a, b, and c. Then possible rankings are the following:
a1 b1 c1
a1 b1 c2
a1 b2 c1
a1 b2 c2
a2 b1 c1
a2 b1 c2
a2 b2 c1
a2 b2 c2
a1 c1 b1
a1 c1 b2
a1 c2 b1
a1 c2 b2
a2 c1 b1
a2 c1 b2
a2 c2 b1
a2 c2 b2
b1 a1 c1
b1 a1 c2
b1 a2 c1
b1 a2 c2
b2 a1 c1
b2 a1 c2
...
What would be an elegant way to generate such rankings?
It's a kind of cross product between the permutations of B={a,b, ...} and the k-combinations of T{1,2} where k is the the number of elements in B. Say we take a p from Perm(B), e.g. p=(b,c,a) and a c from 3-Comb(T), e.g. c=(2,1,1) then we would merge p and c into (b2,c1,a1).
I don't really know if it's elegant but I would choose an algorithm to generate sequentially the permutations of B (cf TAOCP Volume 4 fascicle 2b) and for each permutation apply the above "product" with all the k-combinations generated sequentially (or stored in an array if k is small) (cf TAOCP Volume 4 fascicle 3a).
B={a,b,c, ... }
T={1,2}
k=length(B)
reset_perm(B)
do
p = next_perm(B)
reset_comb(T,k)
do
c = next_kcomb(T,k)
output product(p,c)
while not last_kcomb(T,k)
while not last_perm(B)
By counting in facet results I mean resolve the problem:
I have 7 documents:
A1 B1 C1
A2 B1 C1
A3 B2 C1
A4 B2 C2
A5 B3 C2
A6 B3 C2
A7 B3 C2
If I make the facet query by field B, get the result: B1=2, B2=2, B3=3.
A1 B1 C1
A2 B1 C1 2 - facing by B
--------------====
A3 B2 C1
A4 B2 C2 2 - facing by B
--------------====
A5 B3 C2
A6 B3 C2
A7 B3 C2 3 - facing by B
--------------====
I want to get additional information, something like count in results, by field C. So, how can I query to get a result similar to the following:
A1 B1 C1
A2 B1 C1 2, 1 - facing by B, count C in facet results
--------------=======
A3 B2 C1
A4 B2 C2 2, 2 - facing by B, count C in facet results
--------------=======
A5 B3 C2
A6 B3 C2
A7 B3 C2 2, 1 - facing by B, count C in facet results
--------------=======
Thanks
What you need is Facet Pivots
This will help you get the results and counts of hierarchies.
This is available in Solr 4.0 trunk build. So you may need to apply the patch.
References -
http://wiki.apache.org/solr/HierarchicalFaceting
http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting