In short: I have two arrays which may be different, and I'll like to get the difference/transformation as a series of "actions" (adds and removes). That is, in a basic example:
Current: [a, b, d]
Desired: [a, b, c, d]
Actions: Add c in position 2
Basically, the instructions are how to transform the current array so that it's got the same members and order as the desired array. For my application, each change triggers events which update UI and so on, so it would be highly preferable if the actions were not "redundant": that is, the above could have been remove d, add c # 2, add d # 3, but this would cause a lot of unwanted processing elsewhere in the system.
Perhaps as another example which might help illustrate:
Current: [a, b, d]
Desired: [b, c, d, a]
Actions: remove a, add c # 1, add a # 3
I figure this is something which has been solved before, but it's kinda difficult to search for it since "array difference" doesn't give you the right results.
If it matters, I'm implementing this in Javascript, but I guess the algorithm is language agnostic.
This does indeed exists, it's called the edit distance. The basic algorithm doesn't remember the kind of edits, but it's easily modified.
One type of edit distance is the Levenshtein distance. This wikipedia page contains some code snippets you may find useful.
Hirschberg's algorithm may also be useful.
Related
I have the following problem: I have a 4-dim tensor (b, c, h, w) that I would like to shuffle within c that is divided into n_groups with different permutation for each h.
Let's say c=16 and n_groups is 4 ---> num_elements=4 (in a group).
So, within these 4 c's I would like to shuffle h's.
solution screenshot
The solution above works but is very slow, I have not found any clue to accelerate it.
Do you have any suggestion how it could be improved?
I am trying to solve a set of two equations for two complex variables in terms of the other terms.
I put the two equations in a list and tried to solve for both the variables together, but this kept giving me incorrect results.
However, if I can manually separate out the terms in each equation and substitute them, I'm able to obtain the right answer. This is extremely puzzling and I don't understand if this is a bug in maxima or something's wrong in what I'm doing. Any guidance/comments will be appreciated.
Here's a minimal example that does what I'm speaking about (along with final output). In the following, sol1 gives me incorrect solutions and sol2 gives me the right solution.
(%i2) kill(all)$
declare([R0, a, b, x, y], complex)$
eqs:[(2*%i*x*ω+24*R0^2*conjugate(R0)*b^2*conjugate(b)-48*R0^2*conjugate(R0)*a*b*conjugate(b)+24*R0^2*conjugate(R0)*a^2*conjugate(b)-24*R0^2*conjugate(R0)*conjugate(a)*b^2+48*R0^2*conjugate(R0)*a*conjugate(a)*b+%i*x-2*x-24*R0^2*conjugate(R0)*a^2*conjugate(a)), (6*%i*y*ω+8*R0^3*b^3-24*R0^3*a*b^2+24*R0^3*a^2*b+3*%i*y-2*y-8*R0^3*a^3)]$
(%i4) sol1:factor(solve(eqs, [x, y])[1])$
sol2:factor([solve(eqs[1],x)[1],solve(eqs[2],y)[1]])$
(%i6) factor(subst(sol1,eqs));
factor(subst(sol2,eqs));
(%o5) [0,-24*R0^3*(b-a)^3*(2*%i*ω+%i-1)]
(%o6) [0,0]
Here's a screenshot on wxmaxima, if that helps you to see better :
I am going to implement a personal recommendation system using Apriori algorithm.
I know there are three useful concepts as 'support',confidence' and 'lift. I already know the meaning of them. Also I know how to find the frequent item sets using support concept. But I wonder why confidence and lift concepts are there for if we can find frequent item sets using support rule?
could you explain me why 'confidence' and 'lift' concepts are there when 'support' concept is already applied and how can I proceed with 'confidence' and 'lift' concepts if I have already used support concept for the data set?
I would be highly obliged if you could answer with SQL queries since I am still an undergraduate. Thanks a lot
Support alone yields many redundant rules.
e.g.
A -> B
A, C -> B
A, D -> B
A, E -> B
...
The purpose of lift and similar measures is to remove complex rules that are not much better than the simple rule.
In above case, the simple rule A -> B may have less confidence than the complex rules, but much more support. The other rules may be just coincidence of this strong pattern, with a marginally stronger confidence because of the smaller sample size.
Similarly, if you have:
A -> B confidence: 90%
C -> D confidence: 90%
A, C -> B, D confidence: 80%
then the last rule is even bad, despite the high confidence!
The first two rules yield the same outcome, but with higher confidence. So that last rule shouldn't be 80% correct, but -10% correct if you assume the first two rules to hold!
Thus, support and confidence are not enough to consider.
So I've been trying to solve this for some hours now, but apparently there's still something missing. Maybe I'm thinking the wrong way, but I think it is a very complex problem:
I have three lists with items in a fixed order. For explaining the problem assume they contain items A to Z - mostly in the same order with some exceptions, where items can be in different positions. Also only one list contains all items - the other contain a subset and are missing certain items. As a solution for this problem would be sufficient, it could be possible to have no list with all items, but only partly overlapping sets. Even better would be an algorithm to solve the problem with multiple (> 3) lists.
So here's the example:
List 1: A B C D E F G H I J
List 2: A C D B F G
List 3: B C D E H F G
Now what I want is to match these three lists to visualize where the sort order is different and where are items that are missing. So the result should be:
List 1: A B C D E F G H I J
List 2: A C D B F G
List 3: B C D E H F G
So I immediately see, that List 2 has a B at the wrong position, A is missing from List 3, which also has H in the wrong position.
I was thinking about storing the result in a CSV to import into Excel. So the rows are:
A,A,
B,,B
C,C,C
...
Now my question is: how do I match the lists that way to generate the CSV output? The language I use is Java. So far I failed with the problem that a list other than the reference list contains items earlier, which appear later in the reference list.
This is by the way a real-world problem.
Any suggestions are appreciated.
There are off-the-shelf tools for solving this problem, such as the Unix tool diff3. Trying to solve it for arbitrary numbers of lists is not advisable unless you are willing to invest a lot of time in developing heuristics, as you are then dealing with the NP-hard general case of the longest common subsequence problem.
If I understand your question correctly, you are essentially trying to solve a multiple sequence alignment problem, which is a well-researched topic within bioinformatics. There are several algorithms for it, some of which are based on the concept of Levenshtein distance (which would solve a two-array version of your problem) - I suggest you start there.
The Problem "Consider a relation R with five attributes ABCDE. You are given the following dependancies
A->B
BC->E
ED->A
List all the keys for R.
The teacher gave us the keys, Which are ACD,BCD,CDE
And we need to show the work to get to them.
The First two I solved.
For BCD, the transitive of 2 with 3 to get (BC->E)D->A => BCD->A.
and for ACD id the the transitive of 1 with 4 (BCD), to get (A->B)CD->A => ACD->A
But I can't figure out how to get CDE.
So it seems I did it wrong, after googling I found this answer
methodology to find keys:
consider attribute sets α containing: a. the determinant attributes of F (i.e. A, BC,
ED) and b. the attributes NOT contained in the determined ones (i.e. C,D). Then
do the attribute closure algorithm:
if α+ superset R then α -> R
Three keys: CDE, ACD, BCD
Source
From what I can tell, since C,D are not on the left side of the dependencies. The keys are left sides with CD pre-appended to them. Can anyone explain this to me in better detail as to why?
To get they keys, you start with one of the dependencies and using inference to extend the set.
Let me have a go with simple English, you can find formal definition the net easily.
e.g. start with 3).
ED -> A
(knowing E and D, I know A)
ED ->AB
(knowing E and D, I know A, by knowing A, I know B as well)
ED->AB
Still, C cannot be known, and I have used all the rules now except BC->E,
So I add C to the left hand side, i.e.
CDE ->AB
so, by knowing C,D and E, you will know A and B as well,
Hence CDE is a key for your relation ABCDE. You repeat the same process, starting with other rules until exhausted.