t-sql - select all combintations of groups of rows in single table - sql-server

Ok, I have a table like this:
ThingID SubthingID ThingLevel
1 1 0
1 2 0
1 3 0
1 4 0
2 14 1
2 17 1
3 22 1
3 950 1
I need to select groups of subthings such that I end up with one subthing from a level 0 thing, and all the combinations of subthings, one each from the level 1 things. Note that each subthing belongs to its thing - they're not interchangeable. So there can't be, say, a combination of thing 2 with subthing 950. Also, things come in two levels - level 0 and level 1. A level 1 thing is also level 1, and a level 0 thing is always level 0 - really, it means that level 1 things can be combined with other level 1 or 0 things, but level 0 things can only be combined with level 1 things.
So the output would look like:
GroupID ThingID SubthingID ThingLevel
1 1 1 0
1 2 14 1
1 3 22 1
2 1 2 0
2 2 14 1
2 3 22 1
3 1 3 0
3 2 14 1
3 3 22 1
. . . .
. . . .
. . . .
x 1 4 0
x 2 17 1
x 3 950 1
There are multiple level 0 things, each with one to many subthings. There are multiple level 1 things, each with one to many subthings.
Offhand it would seem like nexted loops would be the answer:
For each level 0 thing begin
for each level 1 thing subthing begin
etc...
But that's obviously not going to handle variable numbers of level one things.
Is there a way to do this with recursion?

Related

How to aggregate number of notes sent to each user?

Consider the following tables
group (obj_id here is user_id)
group_id obj_id role
--------------------------
100 1 A
100 2 root
100 3 B
100 4 C
notes
obj_id ref_obj_id note note_id
-------------------------------------------
1 2 10
1 3 10
1 0 foobar 10
1 4 20
1 2 20
1 0 barbaz 20
2 0 caszes 30
2 1 30
4 1 70
4 0 taz 70
4 3 70
Note: a note in the system can be assigned to multiple users (for instance: an admin could write "sent warning to 2 users" and link it to 2 user_ids). The first user the note gets linked to is stored differently than the other linked users. The note itself is linked to the first linked user only. Whenever group.obj_id = notes.obj_id then ref_obj_id = 0 and note <> null
I need to make an overview of the notes per user. Normally I would do this by joining on group.obj_id = notes.obj_idbut here this goes wrong because of ref_obj_id being 0 (in which case I should join on notes.obj_id)
There are 4 notes in this system (foobar, barbaz, caszes and taz).
The desired output is:
obj_id user_is_primary notes_primary user_is_linked notes_linked
-------------------------------------------------------------------
1 2 10;20 2 30;70
2 1 30 2 10;20
3 0 2 10;70
4 1 70 1 20
How can I get to this aggregated result?
I hope that I was able to explain the situation clearly; perhaps it is my inexperience but I find the data model not the most straightforward.
Couldn't you simply put this in the ON clause of your join?
case when notes.ref_obj_id = 0 then notes.obj_id else notes.ref_obj_id end = group.obj_id

Comparisons across multiple rows in Stata (household dataset)

I'm working on a household dataset and my data looks like this:
input id id_family mother_id male
1 2 12 0
2 2 13 1
3 3 15 1
4 3 17 0
5 3 4 0
end
What I want to do is identify the mother in each family. A mother is a member of the family whose id is equal to one of the mother_id's of another family member. In the example above, for the family with id_family=3, individual 5 has mother_id=4, which makes individual 4 her mother.
I create a family size variable that tells me how many members there are per family. I also create a rank variable for each member within a family. For families of three, I then have the following piece of code that works:
bysort id_family: gen family_size=_N
bysort id_family: gen rank=_n
gen mother=.
bysort id_family: replace mother=1 if male==0 & rank==1 & family_size==3 & (id[_n]==id[_n+1] | id[_n]==id[_n+2])
bysort id_family: replace mother=1 if male==0 & rank==2 & family_size==3 & (id[_n]==id[_n-1] | id[_n]==id[_n+1])
bysort id_family: replace mother=1 if male==0 & rank==3 & family_size==3 & (id[_n]==id[_n-1] | id[_n]==id[_n-2])
What I get is:
id id_family mother_id male family_size rank mother
1 2 12 0 2 1 .
2 2 13 1 2 2 .
3 3 15 1 3 1 .
4 3 17 0 3 2 1
5 3 4 0 3 3 .
However, in my real data set, I have to get the mother for families of size 4 and higher (up to 9), which makes this procedure very inefficient (in the sense that there are too many row elements to compare "manually").
How would you obtain this in a cleaner way? Would you make use of permutations to index the rows? Or would you use a for-loop?
Here's an approach using merge.
// create sample data
clear
input id id_family mother_id male
1 2 12 0
2 2 13 1
3 3 15 1
4 3 17 0
5 3 4 0
end
save families, replace
clear
// do the job
use families
drop id male
rename mother_id id
sort id_family id
duplicates drop
list, clean abbreviate(10)
save mothers, replace
use families, clear
merge 1:1 id_family id using mothers, keep(master match)
generate byte is_mother = _merge==3
list, clean abbreviate(10)
The second list yields
id id_family mother_id male _merge is_mother
1. 1 2 12 0 master only (1) 0
2. 2 2 13 1 master only (1) 0
3. 3 3 15 1 master only (1) 0
4. 4 3 17 0 matched (3) 1
5. 5 3 4 0 master only (1) 0
where I retained _merge only for expositional purposes.

3+ dimensional truth table in APL

I would like to enumerate all the combinations (tuples of values) of 3 or more finite-valued variables which satisfy a given condition. In math notation:
For example (inspired by Project Euler problem 9):
The truth tables for two variables at a time are easy enough:
a ∘.≤ b
1 1 1 1
0 1 1 1
0 0 1 1
b ∘.≤ c
1 1 1 1 1
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
After much head-scratching, I managed to combine them, by computing the ∧ of every 4-valued row of the former with each 4-valued column of the latter, and disclosing (⊃) on the correct axis, between 1 and 2:
⎕← tt ← ⊃[1.5] (⊂[2] a ∘.≤ b) ∘.∧ (⊂[1] b ∘.≤ c)
1 1 1 1 1
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
0 0 0 0 0
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
0 0 0 1 1
Then I could use its ravel to filter all possible tuples of values:
⊃ (,tt) / , a ∘., b ∘., c
1 1 1
1 1 2
1 1 3
1 1 4
1 1 5
1 2 2
1 2 3
...
3 3 5
3 4 4
3 4 5
Is this the best approach to this particular class of problems in APL?
Is there an easier or faster formula for this example, or for the general case?
More generally, comparing my (naïve?) array approach above to traditional scalar languages, I can see that I'm translating each loop into an additional dimension: 3 nested loops become a 3-rank truth table:
for c in 1..NC:
for b in 1..min(c, NB):
for a in 1..min(b, NA):
collect (a,b,c)
But in a scalar language one can effect optimizations along the way, for example breaking loops as soon as possible, or choosing the loop boundaries dynamically. In this case I don't even need to test for a ≤ b ≤ c, because it's implicit in the loop boundaries.
In this example both approaches have O(N³) complexity, so their runtime will only differ by a factor. But I'm wondering: how could I write the array solution in a more optimized way, if I needed to do so?
Are there any good books or online resources that address algorithmic issues or best practices in APL?
Here's an alternative approach. I'm not sure if it would run faster.
Following your algorithm for scalar languages, the possible values of c are
⎕IO←0
c←1+⍳NC
In the inner loops the values for b and a are
b←1+⍳¨NB⌊c
a←1+⍳¨¨NA⌊b
If we combine those
r←(⊂¨¨¨a,¨¨¨b),¨¨¨c
we get a nested array of (a,b,c) triplets which can be flattened and rearranged in a matrix
r←∊r
(((⍴r)÷3),3)⍴r
ADD:
Morten Kromberg sent me the following solution. On Dyalog APL it's ~ 30 times more efficient than the one above:
⎕IO←1
AddDim←{0≡⍵:⍪⍳⍺ ⋄ n←0⌈⍺-x←¯1+⊢/⍵ ⋄ (n⌿⍵),∊x+⍳¨n}
TTable←{⊃AddDim/⌽0,⍵}
TTable 3 4 5

Check if 2d shape composed of block has been cut

I know that the title of this topic might be confusing, but I didn't know how to explain it in a single sentence!
I'll try to be more clear, I have a 2d array of boolean values, every value states if that particular position (or block) is alive or not.
Let's make an example:
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
This array contains 16 "alive" blocks, now I can "kill" some blocks, changing their state from 1 to 0.
What I would like to do is to know if after a "kill", the group splits in two or more separate groups, for example:
1 1 0 1
1 1 0 1
0 1 0 1
1 1 1 1
This shape is still "intact", since the group of 0 is not cutting any of the 1 groups, but in this case:
1 1 0 1
1 1 0 1
0 0 0 1
1 1 1 1
Now I've killed the only bit who was keeping all the 1 together, the shape has been divided in two smaller groups!
I've tried checking the neighbours of the last killed bit but then I can't be sure of other possible connection of the shape.
I've also tried a pathfinding algorithm but this operation should be very fast and a pathfinding is too complex.
How can I achieve this?
Pick any of the alive blocks and do a flood-fill and then check if it got to all the other live blocks.

Comparing adjacent elements in MATLAB

Does anyone know how I can compare the elements in an array with the adjacent elements?
For example, if I have an array:
0 0 0 1 1 1 1 0
0 1 1 1 1 1 1 0
0 1 0 1 1 1 1 0
0 1 1 1 1 1 0 0
0 0 0 0 1 1 1 1
1 1 1 1 1 1 1 1
Is there a way to cycle through each element and perform a logical test of whether the elements around it are equal to 1?
Oops, it looks like someone is doing a homework assignment. Game of life maybe?
There are many ways to do such a test. But learn to do it in a vectorized form. This involves understanding how matlab does indexing, and how the elements of a 2-d array are stored in memory. That will take some time to explain in detail, more than I want to do at this exact moment. I would definitely recommend you learn it though.
Until then, I'll just suggest that if you really are doing the game of life, then the best trick is to use conv2. Thus,
A =[0 0 0 1 1 1 1 0
0 1 1 1 1 1 1 0
0 1 0 1 1 1 1 0
0 1 1 1 1 1 0 0
0 0 0 0 1 1 1 1
1 1 1 1 1 1 1 1];
B = conv2(A,[1 1 1;1 0 1;1 1 1],'same')
B =
1 2 4 4 5 5 3 2
2 2 5 6 8 8 5 3
3 4 8 7 8 7 4 2
2 2 4 5 7 7 6 3
3 5 6 7 7 7 6 3
1 2 2 3 4 5 5 3
Loren has recently posted about this very issue: http://blogs.mathworks.com/loren/2010/01/19/mathematical-recreations-tweetable-game-of-life/ - lots of interesting things can be learned by studying the code in that post and its comments

Resources