There are several algorithms out there that print all combinations of a string, but I need one that prints them out in a specific order. Currently I am using a standard permutation algorithm similar to the one in the top answer (not the question itself) of this question: C++ recursive permutation algorithm for strings -> not skipping duplicates
For example, for the input "ABC", the output will be: ABC ACB BAC BCA CAB CBA
For the input "ACC", it will be: ACC CAC CCA
The outputs are all correct, however I need them in a different order. The input will only consist of the characters 'A' and 'C', and I am sorting the string alphabetically before inputting it to the recursive function for convenience so the input string will always have the same characters together (i.e. AACCC). As for the order, I want to treat the collection of 'C's as a single entity which I shift left for each set of permutations of the characters to the right of the first 'C' only. So for input "ACC", the first output is "ACC" which is OK, the next output should be "CCA" because I shifted all the 'C's one step to the left, then the permutations of "CCA" of all the characters to the right of the first 'C' is the final output which is just "ACA".
I need it to look like this for these inputs:
Input: ACC
Output: ACC CCA CAC
Input: AACC
Output:
AACC ACCA ACAC CCAA CACA CAAC
Any idea how I should modify my algorithm to produce the combinations in this order?
For a string with two distinct characters A and C, given n is the number of A's, it sounds like what you're looking for is a concatenation of these sequences: All permutations beginning with exactly n A's in reverse lexicographic order, all permutations beginning with exactly n-1 A's in reverse lexicographic order, etc. So, you could take your existing output which is in lexicographic order, and iterate over it in reverse order, selecting elements matching /^A{n}C/, /^A{n-1}C/ through /^A{0}C/ and adding them to a new collection.
You could generate this output directly by generating strings of A's of each length from n A's to zero and then for each one, append the permutations of the remaining characters in reverse lexicographic order.
Related
my array have is
let arr=['20336.41905.32121.58472_20336.41905.60400.51092_1',
'20336.41905.32121.58472_20336.41905.60400.48025_2',
'20336.41905.32121.58472_20336.41905.41816.60719_3',
'20336.41905.32121.58472_20336.41905.41816.63631_4',
'20336.41905.32121.58472_20336.41905.31747.22942_2',
]
want to get sort as an order like this
['20336.41905.32121.58472_20336.41905.60400.51092_1',
'20336.41905.32121.58472_20336.41905.60400.48025_2',
'20336.41905.32121.58472_20336.41905.31747.22942_2',
'20336.41905.32121.58472_20336.41905.41816.60719_3',
'20336.41905.32121.58472_20336.41905.41816.63631_4',
]
We can try sorting using a lambda expression:
var arr = ['20336.41905.32121.58472_20336.41905.60400.51092_1',
'20336.41905.32121.58472_20336.41905.60400.48025_2',
'20336.41905.32121.58472_20336.41905.41816.60719_3',
'20336.41905.32121.58472_20336.41905.41816.63631_4',
'20336.41905.32121.58472_20336.41905.31747.22942_2',
];
arr.sort((a, b) => parseInt(a.split(/_(?!.*_)/)[1]) - parseInt(b.split(/_(?!.*_)/)[1]));
console.log(arr);
The logic used above is to split on the final underscore of each array value, and parse the right side number to an integer. Then we do a comparison of each pair of numbers to achieve the sort.
Assuming you are using JavaScript...
The sort function can take a callback as an argument. You can use this to determine the criteria by which you desire to sort your array.
When comparing 2 values compare a to b, the function should return a number greater than zero if a goes after b, less than zero if a goes before b, and exactly zero if any could follow the other (ie. 1.10 and 1.100 could be sorted as [1.10, 1.00] or [1.100, 1.00] because for all we care they have the same value, on your particular case, 2 array elements ending in 4 would follow the same principle because that is the only number in our criteria).
An example would be:
arr.sort((a, b)=>{
return parseInt(a.slice(-1)) - parseInt(b.slice(-1))
})
Note that this will only work if the last character on every element of your array is a numeric character and would not ever care about the second to last character if 2 last characters are equal.
There is an ugliest solution too that could work, although I don't really recommend it, it would take into consideration all characters in reverse order.
Map over all elements, reverse them, sort the array without using a callback (arr.sort()), and then reverse all elements again.
Statement:
Given an array of strings S. You can make a string by combining elements from the array S (you can use an element more than once)
In some situations, there are many ways to make a certain string from the array S.
Example:
S = {a, ab, ba}
Then there are 2 ways to make the string "aba":
"a" + "ba"
"ab" + "a"
Question:
Given an array of string S. Find the shortest string such that there are more that one way to make that string from S. If there're none, print out -1.
P/S: I have been thinking for many days but this is the best one I've got so far:
Generate all permutations of the array
For each permutation, make a string from the array S
Check if that string is made before, if yes, print out the string, if not, save that string.
But this algorithm clearly won't pass all the test cases. I can't think of any better algorithm.
Imagine that you have found your string, and you are matching it two different ways with strings from S. You start with two different strings that match the prefix, and then you repeatedly add a string from S to the shorter one until you end up with a matching length. From your example, that's
"ab"
"a"
"ab"
"aba"
"aba"
"aba"
At every step, you have 2 different strings from S that overlap at the end.
Imagine a directed graph where every vertex is a tuple (i,j,t), where i and j are the indexes of the overlapping strings at the end, and t is the number of characters left over at the end of the longer one after the overlapping section. Make it a rule that t >= 0 and that string i is always the one that ends first.
The edges of the graph indicate which vertexes you can get to by adding a new string to the shorter one, with a cost equal to the length of the added string. Of course you can only add a string if it overlaps with the t characters left over on the longer side.
Your task is then to use Dijkstra's algorithm to find the shortest path in this graph, from an initial selection of 2 distinct strings to a pair with t=0. Initially sorting the array of strings will let you use a binary search to find the strings that overlap the required suffix (the longer ones will all be together), which is an effective optimization.
Here's an O(N3) algorithm, where N is the total length of each string:
For every element Si in S:
Construct an NFA for the regular expression Si(S1|...|Sn)*
Construct an NFA for the regular expression (S1|...|Si-1|Si+1|...|Sn)(S1|...|Sn)*
Construct an NFA that is the intersection of the NFA in step 1.1 and step 1.2
Find the shortest string accepted by the NFA in step 1.3
Return the shortest string among the strings in step 1.4
The above algorithm can be improved to O(N2logN):
Construct an NFA for the regular expression (S1|...|Sn)*
Construct cross-product of two copies of the NFA in step 1.
For each state in the NFA in step 2, find the shortest string accepted by the NFA from that state.
Let 𝒮 = {S}.
While 𝒮 is not empty:
Take an element T from 𝒮.
Partition T into P and Q somewhat evenly.
Construct an NFA for the regular expression (P1|...|Pp)(S1|...|Sn)*, reusing the NFA in step 1.
Construct an NFA for the regular expression (Q1|...|Qq)(S1|...|Sn)*, reusing the NFA in step 1.
Construct cross-product of the NFA in step 5.3 and step 5.4, reusing the NFA in step 2.
Find the shortest string accepted by the NFA in step 5.5, reusing the result of step 3.
If P has more than 1 element, put P into 𝒮.
If Q has more than 1 element, put Q into 𝒮.
Return the shortest string among the strings in step 5.6
Edit: The following is an O(N2) algorithm, improved from the top answer:
Let T be every suffix of every string of S.
Build a trie out of T.
Let G be a weighted directed graph. The vertices are the elements of T, and the edges are: for each string Si in S and Tj in T, if Si = Tj + D or Tj = Si + D (using the trie in step 2 to find all such pairs), add an edge from D to Tj weighted length of Si.
Find the distance from the empty string to every vertex in G.
For each string Si, Sj in S, if Si != Sj and Si = Sj + D (using the trie in step 2 to find all such pairs), find the distance from the empty string to D (using step 4).
The length of the answer is half of the shortest distance among all distances in step 5. (It's trivial to find the actual answer but I'm too lazy to describe it :p)
I'm trying to compare how similar 2 sequences are to eachother in Ruby. For example, 1234657890 to 1234567890 is 80% similar. I've thought about decreasing the number by 1 digit and then checking but then it makes that example 40% similar
You could do something like:
num_str1 = '1234567890'
num_str2 = '1234657890'
num_str1.chars.zip(num_str2.chars).count { |a, b| a == b }
#=> 8
This converts each string to character-arrays, then pairing elements by index, before comparing them. The percentage calculation is left as an exercise. See ruby-docs.org for more info on the methods used.
I want to check if any string in an array of strings is a prefix of any other string in the same array. I'm thinking radix sort, then single pass through the array.
Anyone have a better idea?
I think, radix sort can be modified to retrieve prefices on the fly. All we have to do is to sort lines by their first letter, storing their copies with no first letter in each cell. Then if the cell contains empty line, this line corresponds to a prefix. And if the cell contains only one entry, then of course there are no possible lines-prefices in it.
Here, this might be cleaner, than my english:
lines = [
"qwerty",
"qwe",
"asddsa",
"zxcvb",
"zxcvbn",
"zxcvbnm"
]
line_lines = [(line, line) for line in lines]
def find_sub(line_lines):
cells = [ [] for i in range(26)]
for (ine, line) in line_lines:
if ine == "":
print line
else:
index = ord(ine[0]) - ord('a')
cells[index] += [( ine[1:], line )]
for cell in cells:
if len(cell) > 1:
find_sub( cell )
find_sub(line_lines)
If you sort them, you only need to check each string if it is a prefix of the next.
To achieve a time complexity close to O(N2): compute hash values for each string.
Come up with a good hash function that looks something like:
A mapping from [a-z]->[1,26]
A modulo operation(use a large prime) to prevent overflow of integer
So something like "ab" gets computed as "12"=1*27+ 2=29
A point to note:
Be careful what base you compute the hash value on.For example if you take a base less than 27 you can have two strings giving the same hash value, and we don't want that.
Steps:
Compute hash value for each string
Compare hash values of current string with other strings:I'll let you figure out how you would do that comparison.Once two strings match, you are still not sure if it is really a prefix(due to the modulo operation that we did) so do a extra check to see if they are prefixes.
Report answer
We are given a string of the form: RBBR, where R - red and B - blue.
We need to find the minimum number of swaps required in order to club the colors together. In the above case that answer would be 1 to get RRBB or BBRR.
I feel like an algorithm to sort a partially sorted array would be useful here since a simple sort would give us the number of swaps, but we want the minimum number of swaps.
Any ideas?
This is allegedly a Microsoft interview question according to this.
Take one pass over the string and count the number of reds (#R) and the number of blues (#B). Then take a second pass counting the number of reds in the first #R balls (#r) and the number of blue balls in the first #B balls (#b). The lesser of (#R - #r) and (#B - #b) will be the minimum number of swaps needed.
We are given the string S that we have to convert to the final string F = R^a B^b or B^b R^a. The number of differences between S and F should be even because for every misplaced R there will be a complementary misplaced B. So why not find the minimum number of differences between S and both possible F's and divide that by 2?
For example, you're given S = RBRRBRBR which should convert to
RRRRRBBB
or
BBBRRRRR
Comparing the differences between S and F for each character for each possibility, there are 4 differences for each possible final string so regardless the minimum is 2 swaps.
Let's look at your example. You know that the end state will be RRBB or BBRR. In other words, the end state is always nRmB or mBnR, where n is the number of R's and m is the number o B's in your string.
Since the end state is defined, maybe some sort of path-finding algorithm would be a good aproach for this? How about considering each swap as a state-change and thinking of a heuristic function to aproximate the number of left over swaps needed.
I'm just throwing an idea in the air, but I hope this helps.
Start with two indices simultaneously from the right and left end of the string. Advance the left index until you find an R. Advance the right index backwards until you find a B. Swap them. Repeat until the left index meets the right index, and count the swaps. Then, do the same, but look for B on the left and R on the right. The minimum is the lower of both swap counts.
I think the number of swaps can be derived from the number of inversions required to sort the vector. This is the example of doing the same with permutation vector.
This isn't a technical answer, but I looked at this more intuitively.
RRBBBBR is can be reduced to RBR, since a group of R's can be moved as a single block. This means that the array is really just a N sets of RB.
The only thing that matters is the number of N sets of RB blocks (including incomplete blocks for the last one).
RBR -> 1 swap to get to RRB (2 sets of RB block, RB and R)
RBRB-> 1 swap to get to RRBB (2 full sets of RB blocks)
RBRBRB-> 2 swaps to get to RRRBBB (3 full sets of RB blocks)
RBRBRBRB -> 4 sets of RB = 3 swaps
So to generalize this, the number of swaps needed = N sets of RB block (including incomplete blocks) and subtract 1.