Drupal 7 Vieuws: Grouping terms with content - drupal-7

I have a taxonomy vocabulary that goes like this:
TERM A
-- Term 1
---- Term 1a
---- Term 1b
-- Term 2
---- Term 2a
---- Term 2b
TERM B
Now for example I have in Term 1a : 5 nodes
and in Term 1b: 3 nodes
I would like a view that shows this:
TERM A (as a title)
Term 1 (2 fields) but only from 1 node
Term 2 (2 fields) but only from 1 node
TERM B (as a title)
...
I tried it with Grouping fields but then I get a result like this
---- Term 1a
---- Term 1b
? I can't figure it out...

You have to create 2 views.
The first will return your Terms list (View Terms) and, the second will be included into the other one with Conditionnal Filter (with Term Reference).
On this post https://drupal.org/node/1197752, users explain this trick.
Hope it's usefull for you :)

Related

How can order query result based on word occurrence count

I have a product table and wanna search on Tag column that results must sort by count of occurrence of words.
ID | Tag
---------------------------------------
1 | LG television
2 | BOSCH vacuum cleaner 55 mm
3 | SONY home theater 55 watt
---------------------------------------
String to search: LG 55 vacuum theater home
Desired results:
1. SONY home theater 55 watt (contains three words: 55,theater,home)
2. BOSCH vacuum cleaner 55 mm (contains two words: 55,vacuum)
3. LG television (contains one word: LG)
There is a solution in Find string according to words count that uses LIKE and is very slow.
I want to implement it by FULLTEXT search
UPDATE: I tried below solution but results are wrong
SELECT ft.[Rank], p.Tag
FROM tblProducts AS p
INNER JOIN FREETEXTTABLE(tblProducts, Tag, 'LG 55 vacuum theater home') AS ft
ON ft.[Key] = p.ProductID
ORDER BY ft.[Rank] DESC;

Stata: Observation-pairwise calculation

input X group
21 1
62 1
98 1
12 2
87 2
end
Now I try to calculate a measure as follows:
$$ \sum_{g} \left | X_{ig}-X_{jg} \right | $$
,where $i$ or $j$ ($i \neq j$) indexes an observation. g corresponds to the group variable (here, 1 and 2)
How to calculate this number using loops?
Looks like a Gini mean difference, apart from a scaling factor. There are numerous user-written commands already in this territory. There is (unusually) a summary within the Stata manual at [R] inequality.
In addition, this is related to the second L-moment. See the lmoments command from SSC.
You need not calculate this through a double loop over indexes. It collapses to a linear combination of the order statistics.
LATER: See David's 1998 paper which is open-access at
https://doi.org/10.1214/ss/1028905831

Checking for node repetition in multi-parent tree

I've got a pretty simple tree implementation in SQL:
CREATE TABLE [dbo].[Nodes] (
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](max) NULL
);
CREATE TABLE [dbo].[NodeNodes] (
[ParentNodeId] [int] NOT NULL,
[ChildNodeId] [int] NOT NULL
);
My tree implementation is such that a node can have multiple parents. This is so the user can create custom trees that group together commonly used nodes. For example:
1 8 9
/ \ / \ / \
2 3 4 7 2 6
/ \ / \ / \
4 5 6 7 4 5
Node | Parents | Children
---------------------------
1 | - | 2,3
2 | 1,9 | 4,5
3 | 1 | 6,7
4 | 2,8 | -
5 | 2 | -
6 | 3,9 | -
7 | 3,8 | -
8 | - | 4,7
9 | - | 2,6
So there are three trees which are indicated by the three nodes with no parent. My problem is validating a potential relationship when the user adds a node as a child of another. I would like no node to appear twice in the same tree. For example, adding node 2 as a child of node 6 should fail because that would cause node 2 to appear twice in 1's tree and 9's tree. I'm having trouble writing an efficient algorithm that does this.
My first idea was to find all the roots of the prospective parent, flatten the trees of the roots to get one list of nodes per tree, then intersect those lists with the prospective child, and finally pass the validation only if all of the resultant intersected lists are empty. Going with the example, I'd get these steps:
1) Trace prospective parent through all parents to roots:
6->3->1
6->9
2) Flatten trees of the roots
1: {1,2,3,4,5,6,7}
9: {2,4,5,6,9}
3) Intersect lists with the prospective child
1: {1,2,3,4,5,6,7}^{2} = {2}
9: {2,4,5,6,9}^{2} = {2}
4) Only pass if all result lists are empty
1: {2} != {} ; fail
9: {2} != {} ; fail
This process works, except for the fact that it requires putting entire trees into memory. I have some trees with 20,000+ nodes and this takes almost a minute to run. This performance isn't a 100% dealbreaker, but it is very frustrating. Is there a more efficient algorithm to do this?
Edit 4/2 2pm
The above algorithm doesn't actually work. deroby pointed out that adding 9 as a child to 7 will be passed by the algorithm but shouldn't be. The problem is that adding a node with children to another node will succeed as long as the node isn't repeated -- it doesn't validate the children.
A year later I stumbled upon my own question and I decided I would add my solution. It turns out I had just forgotten my basic data structures. What I originally thought was a simple tree was actually a directed graph, and what I was testing for was a cycle. Seeing as how cycle detection is a pretty common thing, there should be numerous solutions and discussions about it out there on the internets. See Best algorithm for detecting cycles in a directed graph for one example.

How to store bidirectional relationships

I am writing some code to find duplicate customer details in a database. I'll be using Levenshtein distance.
However, I am not sure how to store the relationships. I use databases all the time but have never come accross this situation and wondered if someone could point me in the right direction.
What confuses me is how to store the bidirectional nature of the relationship.
I've started to put some examples below, but wondered if there is a best practice for storing this type of data,
Example data
id, address
001, 5 Main Street
002, 5 Main St.
003, 5 Main Str
004, 6 High Street
005, 7 Low Street
006, 7 Low St
Suggestion 1
customer_id1, customer_id2, relationship_strength
001, 002, 0.74
001, 003, 0.77
002, 003, 0.76
005, 006, 0.77
Not happy with this approach as it sort of infers a one way relationship between customer_id1 to customer_id2. Unless of course I include all relationships both ways, but that would double the amount of processing time and the size of the tables.
eg would need to include: 002, 001, 0.74
Suggestion 2
customer_id, grouping_id
001, 1
002, 1
003, 1
005, 2
006, 2
The way to deal with symmetric relations in a relational system is as follows:
choose a canonical form in which the symmetric pairs are stored, e.g. customer_id1 < customer_id2.
Define a view SYMM_TBL as select id1,id2,... from ... UNION select id2 as id1,id1 as id2, ... FROM ...
Decent systems ought not punish you in the performance area when querying this view.
What we have here is a graph in which each node has a relationship (edit distance) with every other node. This is not in the normal range of data models. It is also not a permanent feature of your database (assuming you resolve the business processes which led to the duplicate data) so it isn't worth sweating over the solution which best fits relational theory. What we need is a a practical solution.
Think of it as a matrix. If we go for the optimum processing we won't execute the duplicate scorings. So we score Address 1 against all the other Addresses, we score Address 2 against all the other Addresses except Address 1, we score Address 3 against all the other Addresses except Addresses 1 and 2, etc. And what we end up with is a bit like a football league table:
addr
1 2 3 4 5
addr
1 - 95 95 80 76
2 - - 100 75 72
3 - - - 75 72
4 - - - - 83
5 - - - - -
This data can best be stored in suggestion 1, a table of ID1, ID2, SCORE. Although we do need to pivot the data to get the output looking like that :)
In a proper league table there are two sets of scores - Home and Away - so the table is symmetrical. But that doesn't apply here, as the edit distance for 1 > 2 is the same as 2 > 1. However, it would make querying the results more straightforward if the result set included the mirrored scores. That is, for records (1,5,76), (2,5,72), etc we generate records (5,1,76), (5,2,72). This could be done at the end of the scoring process.
addr
1 2 3 4 5
addr
1 - 95 95 80 76
2 95 - 100 75 72
3 95 100 - 75 72
4 80 75 75 - 83
5 76 72 72 83 -
Of course, this is mainly a presentational thing, so it only needs to be done for display purposes, e.g. exporting the data to a spreadsheet. We can still get all the scores for, say, Address 5 in a readable fashion without miiroring the scores using a simple SQL statement:
select case when id1 = 5 then id1 else id2 end as id1
, case when id1 = 5 then id2 else id1 end as id2
, score
from your_table
where id1 = 5
or id2 = 5
/
As always it depends on what you want to do with the data once you've calculated it.
Assuming it's simply to identify or locate duplicates then your suggestion 1 is what I'd use, i.e. a second table that simply stores the pairs and the strengths. My only suggestion is to make the strengths a scaled integer rather than a decimal.

Genetics algorithms theoretical question

I'm currently reading "Artificial Intelligence: A Modern Approach" (Russell+Norvig) and "Machine Learning" (Mitchell) - and trying to learn basics of AINN.
In order to understand few basic things I have two 'greenhorn' questions:
Q1: In a genetic algorithm given the two parents A and B with the chromosomes 001110 and 101101, respectively, which of the following offspring could have resulted from a one-point crossover?
a: 001101
b: 001110
Q2: Which of the above offspring could have resulted from a two-point crossover? and why?
Please advise.
It is not possible to find parents if you do not know the inverse-crossover function (so that AxB => (a,b) & (any a) => (A,B)).
Usually the 1-point crossover function is:
a = A1 + B2
b = B1 + A2
Even if you know a and b you cannot solve the system (system of 2 equations with 4 variables).
If you know any 2 parts of any A or/and B then it can be solved (system of 2 equations with 2 variables). This is the case for your question as you provide both A and B.
Generally crossover function does not have inverse function and you just need to find the solution logically or, if you know parents, perform the crossover and compare.
So to make a generic formula for you we should know 2 things:
Crossover function.
Inverse-crossover function.
The 2nd one is not usually used in GAs as it is not required.
Now, I'll just answer your questions.
Q1: In a genetic algorithm given the
two parents A and B with the
chromosomes 001110 and 101101,
respectively, which of the following
offspring could have resulted from a
one-point crossover?
Looking at the a and b I can see the crossover point is here:
1 2
A: 00 | 1110
B: 10 | 1101
Usually the crossover is done using this formula:
a = A1 + B2
b = B1 + A2
so that possible children are:
a: 00 | 1101
b: 10 | 1110
which excludes option b from the question.
So the answer to Q1 is the result child is a: 001101 assuming given crossover function
Q2: Which of the above offspring could
have resulted from a two-point
crossover? and why?
Looking at the a and b I can see the crossover points can be here:
1 2 3
A: 00 | 11 | 10
B: 10 | 11 | 01
Usual formula for 2-point crossover is:
a = A1 + B2 + A3
b = B1 + A2 + B3
So the children would be:
a = 00 | 11 | 10
b = 10 | 11 | 01
Comparing them to the options you asked (small a and b) we can say the answer:
Q2. A: Neither of a or b could be result of 2-point crossover with AxB according to the given crossover function.
Again it is not possible to answer your questions without knowing the crossover function.
The functions I provided are common in GA, but you can invent so many of them so they could answer the question (see the comment below):
One point crossover is when you make one join from each parent, two point crossover is when you make two joins. i.e. two from one parent and one from the others.
See crossover (wikipedia) for further info.
Regarding Q1, (a) could have been produced by a one-point crossover, taking bits 0-4 from parent A and bit 5 from parent B. (b) could not unless your crossover algorithm allows for null contributions, i.e. parent contributions of null weight. In that case, parent A could contribute its full chromosome (bits 0-5) and parent B would contribute nil, yielding (b).
Regarding Q2, both (a) and (b) are possible. There are a few combinations to test; too tedious to write, but you can do the work with pen and paper. :-)

Resources