I am going through my system dictionary and looking for words that are, according to a strict definition, neither subsets nor supersets of any other word.
The implementation below does not work, but if it did, it would be pretty efficient, I think. How do I iterate through the array and also remove items from that same array during iteration?
def collect_dead_words
result = #file #the words in my system dictionary, as an array
wg = WordGame.new # the class that "knows" the find_subset_words &
# find_superset_words methods
result.each do |value|
wg.word = value
supersets = wg.find_superset_words.values.flatten
subsets = wg.find_subset_words.values.flatten
result.delete(value) unless (matches.empty? && subsets.empty?)
result.reject! { |cand| supersets.include? cand }
result.reject! { |cand| subsets.include? cand }
end
result
end
Note: find_superset_words and find_subset_words both return hashes, hence the values.flatten bit
It is inadvisable to modify a collection while iterating over it. Instead, either iterate over a copy of the collection, or create a separate array of things to remove later.
One way to accomplish this is with Array#delete_if. Here's my run at it so you get the idea:
supersets_and_subsets = []
result.delete_if do |el|
wg.word = el
superset_and_subset = wg.find_superset_words.values.flatten + wg.find_subset_words.values.flatten
supersets_and_subsets << superset_and_subset
!superset_and_subset.empty?
end
result -= supersets_and_subsets.flatten.uniq
Here's what I came up with based on your feedback (plus a further optimization by starting with the shortest words):
def collect_dead_words
result = []
collection = #file
num = #file.max_by(&:length).length
1.upto(num) do |index|
subset_by_length = collection.select {|word| word.length == index }
while !subset_by_length.empty? do
wg = WordGame.new(subset_by_length[0])
supermatches = wg.find_superset_words.values.flatten
submatches = wg.find_subset_words.values.flatten
collection.reject! { |cand| supermatches.include? cand }
collection.reject! { |cand| submatches.include? cand }
result << wg.word if (supermatches.empty? && submatches.empty?)
subset.delete(subset_by_length[0])
collection.delete(subset_by_length[0])
end
end
result
end
Further optimizations are welcome!
The problem
As I understand, string s1 is a subset of string s2 if s1 == s2 after zero or more characters are removed from s2; that is, if there exists a mapping m of the indices of s1 such that1:
for each index i of s1, s1[i] = s2[m(i)]; and
if i < j then m(i) < m(j).
Further s2 is a superset of s1 if and only if s1 is a subset of s2.
Note that for s1 to be a subset of s2, s1.size <= s2.size must be true.
For example:
"cat" is a subset of "craft" because the latter becomes "cat" if the "r" and "f" are removed.
"cat" is not a subset of "cutie" because "cutie" has no "a".
"cat" is not a superset of "at" because "cat".include?("at") #=> true`.
"cat" is not a subset of "enact" because m(0) = 3 and m(1) = 2, but m(0) < m(1) is false;
Algorithm
Subset (and hence superset) is a transitive relation, which permit significant algorithmic efficiencies. By this I mean that if s1 is a subset of s2 and s2 is a subset of s3, then s1 is a subset of s3.
I will proceed as follows:
Create empty sets neither_sub_nor_sup and longest_sups and an empty array subs_and_sups.
Sort the words in the dictionary by length, longest first.
Add w to neither_sub_nor_sup, where w is longest word in the dictionary.
For each subsequent word w in the dictionary (longest to shortest), perform the following operations:
for each element u of neither_sub_nor_sup determine if w is a subset of u. If it is, move u from neither_sub_nor_sup to longest_sups and append u to subs_and_sups.
if one or more elements were moved from from neither_sub_nor_sup to longest_sups, append w to subs_and_sups; else add w to neither_sub_nor_sup.
Return subs_and_sups.
Code
require 'set'
def identify_subs_and_sups(dict)
neither_sub_nor_sup, longest_sups = Set.new, Set.new
dict.sort_by(&:size).reverse.each_with_object([]) do |w,subs_and_sups|
switchers = neither_sub_nor_sup.each_with_object([]) { |u,arr|
arr << u if w.subset(u) }
if switchers.any?
subs_and_sups << w
switchers.each do |u|
neither_sub_nor_sup.delete(u)
longest_sups << u
subs_and_sups << u
end
else
neither_sub_nor_sup << w
end
end
end
class String
def subset(w)
w =~ Regexp.new(self.gsub(/./) { |m| "#{m}\\w*" })
end
end
Example
dict = %w| cat catch craft cutie enact trivial rivert river |
#=> ["cat", "catch", "craft", "cutie", "enact", "trivial", "rivert", "river"]
identify_subs_and_sups(dict)
#=> ["river", "rivert", "cat", "catch", "craft"]
Variant
Rather than processing the words in the dictionary from longest to shortest, we could instead order them shortest to longest:
def identify_subs_and_sups1(dict)
neither_sub_nor_sup, shortest_sups = Set.new, Set.new
dict.sort_by(&:size).each_with_object([]) do |w,subs_and_sups|
switchers = neither_sub_nor_sup.each_with_object([]) { |u,arr|
arr << u if u.subset(w) }
if switchers.any?
subs_and_sups << w
switchers.each do |u|
neither_sub_nor_sup.delete(u)
shortest_sups << u
subs_and_sups << u
end
else
neither_sub_nor_sup << w
end
end
end
identify_subs_and_sups1(dict)
#=> ["craft", "cat", "rivert", "river"]
Benchmarks
(to be continued...)
1 The OP stated (in a later comment) that s1 is not a substring of s2 if s2.include?(s1) #=> true. I am going to pretend I never saw that, as it throws a spanner into the works. Unfortunately, subset is no longer a transitive relation with that additional requirement. I haven't investigate the implications of that, but I suspect it means a rather brutish algorithm would be required, possibly requiring pairwise comparisons of all the words in the dictionary.
Related
I have one array like below
[["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
And I want result like below
"GJ, MP, KL, HR, MH"
First element of array ["GJ","MP"]
Added is in the answer_string = "GJ, MP"
Now Find MP which is the last element of this array in the other where is should be first element like this ["MP","KL"]
after this I have to add KL in to the answer_string = "GJ, MP, KL"
This is What I want as output
Given
ary = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
(where each element is in fact an edge in a simple graph that you need to traverse) your task can be solved in a quite straightforward way:
acc = ary.first.dup
ary.size.times do
# Find an edge whose "from" value is equal to the latest "to" one
next_edge = ary.find { |a, _| a == acc.last }
acc << next_edge.last if next_edge
end
acc
#=> ["GJ", "MP", "KL", "HR", "MH"]
Bad thing here is its quadratic time (you search through the whole array on each iteration) that would hit you badly if the initial array is large enough. It would be faster to use some auxiliary data structure with the faster lookup (hash, for instance). Smth. like
head, *tail = ary
edges = tail.to_h
tail.reduce(head.dup) { |acc, (k, v)| acc << edges[acc.last] }
#=> ["GJ", "MP", "KL", "HR", "MH"]
(I'm not joining the resulting array into a string but this is kinda straightforward)
d = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
o = [] # List for output
c = d[0][0] # Save the current first object
loop do # Keep looping through until there are no matching pairs
o.push(c) # Push the current first object to the output
n = d.index { |a| a[0] == c } # Get the index of the first matched pair of the current `c`
break if n == nil # If there are no found index, we've essentially gotten to the end of the graph
c = d[n][1] # Update the current first object
end
puts o.join(',') # Join the results
Updated as the question was dramatically changed. Essentially, you navigating a graph.
I use arr.size.times to loop
def check arr
new_arr = arr.first #new_arr = ["GJ","MP"]
arr.delete_at(0) # remove the first of arr. arr = [["HR","MH"],["MP","KL"],["KL","HR"]]
arr.size.times do
find = arr.find {|e| e.first == new_arr.last}
new_arr << find.last if find
end
new_arr.join(',')
end
array = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
p check(array)
#=> "GJ,MP,KL,HR,MH"
Assumptions:
a is an Array or a Hash
a is in the form provided in the Original Post
For each element b in a b[0] is unique
First thing I would do is, if a is an Array, then convert a to Hash for faster easier lookup up (this is not technically necessary but it simplifies implementation and should increase performance)
a = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]]
a.to_h
#=> {"GJ"=>"MP", "HR"=>"MH", "MP"=>"KL", "KL"=>"HR"}
UPDATE
If the path will always be from first to end of the chain and the elements are always a complete chain, then borrowing from #KonstantinStrukov's inspiration: (If you prefer this option then please given him the credit ✔️)
a.to_h.then {|edges| edges.reduce { |acc,_| acc << edges[acc.last] }}.join(",")
#=> "GJ,MP,KL,HR,MH"
Caveat: If there are disconnected elements in the original this result will contain nil (represented as trailing commas). This could be solved with the addition of Array#compact but it will also cause unnecessary traversals for each disconnected element.
ORIGINAL
We can use a recursive method to lookup the path from a given key to the end of the path. Default key is a[0][0]
def navigate(h,from:h.keys.first)
return unless h.key?(from)
[from, *navigate(h,from:h[from]) || h[from]].join(",")
end
Explanation:
navigation(h,from:h.keys.first) - Hash to traverse and the starting point for traversal
return unless h.key?(key) if the Hash does not contain the from key return nil (end of the chain)
[from, *navigate(h,from:h[from]) || h[from]].join(",") - build a Array of from key and the recursive result of looking up the value for that from key if the recursion returns nil then append the last value. Then simply convert the Array to a String joining the elements with a comma.
Usage:
a = [["GJ","MP"],["HR","MH"],["MP","KL"],["KL","HR"]].to_h
navigate(a)
#=> "GJ,MP,KL,HR,MH"
navigate(a,from: "KL")
#=> "KL,HR,MH"
navigate(a,from: "X")
#=> nil
I'm quite new to Ruby, and was hoping to get the difference between two arrays.
I am aware of the usual method:
a = [...]
b = [...]
difference = (a-b)+(b-a)
But the problem is that this is computing the set difference, because in ruby, the statement (a-b) defines the set compliment of a, relative to b.
This means [1,2,2,3,4,5,5,5,5] - [5] = [1,2,2,3,4], because it takes out all of occurrences of 5 in the first set, not just one, behaving like a filter on the data.
I want it to remove differences only once, so for example, the difference of [1,2,2,3,4,5,5,5,5], and [5] should be [1,2,2,3,4,5,5,5], removing just one 5.
I could do this iteratively:
a = [...]
b = [...]
complimentAbyB = a.dup
complimentBbyA = b.dup
b.each do |bValue|
complimentAbyB.delete_at(complimentAbyB.index(bValue) || complimentAbyB.length)
end
a.each do |aValue|
complimentBbyA.delete_at(complimentBbyA.index(aValue) || complimentBbyA.length)
end
difference = complimentAbyB + complimentBbyA
But this seems awfully verbose and inefficient. I have to imagine there is a more elegant solution than this. So my question is basically, what is the most elegant way of finding the difference of two arrays, where if one array has more occurrences of a single element then the other, they will not all be removed?
I recently proposed that such a method, Ruby#difference, be added to Ruby's core. For your example, it would be written:
a = [1,2,2,3,4,5,5,5,5]
b = [5]
a.difference b
#=> [1,2,2,3,4,5,5,5]
The example I've often given is:
a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]
a.difference b
#=> [1, 3, 2, 2]
I first suggested this method in my answer here. There you will find an explanation and links to other SO questions where I proposed use of the method.
As shown at the links, the method could be written as follows:
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
.....
ha = a.group_by(&:itself).map{|k, v| [k, v.length]}.to_h
hb = b.group_by(&:itself).map{|k, v| [k, v.length]}.to_h
ha.merge(hb){|_, va, vb| (va - vb).abs}.inject([]){|a, (k, v)| a + [k] * v}
ha and hb are hashes with the element in the original array as the key and the number of occurrences as the value. The following merge puts them together and creates a hash whose value is the difference of the number of occurrences in the two arrays. inject converts that to an array that has each element repeated by the number given in the hash.
Another way:
ha = a.group_by(&:itself)
hb = b.group_by(&:itself)
ha.merge(hb){|k, va, vb| [k] * (va.length - vb.length).abs}.values.flatten
Given two arrays of equal size, how can I find the number of matching elements disregarding the position?
For example:
[0,0,5] and [0,5,5] would return a match of 2 since there is one 0 and one 5 in common;
[1,0,0,3] and [0,0,1,4] would return a match of 3 since there are two matches of 0 and one match of 1;
[1,2,2,3] and [1,2,3,4] would return a match of 3.
I tried a number of ideas, but they all tend to get rather gnarly and convoluted. I'm guessing there is some nice Ruby idiom, or perhaps a regex that would be an elegant answer to this solution.
You can accomplish it with count:
a.count{|e| index = b.index(e) and b.delete_at index }
Demonstration
or with inject:
a.inject(0){|count, e| count + ((index = b.index(e) and b.delete_at index) ? 1 : 0)}
Demonstration
or with select and length (or it's alias – size):
a.select{|e| (index = b.index(e) and b.delete_at index)}.size
Demonstration
Results:
a, b = [0,0,5], [0,5,5] output: => 2;
a, b = [1,2,2,3], [1,2,3,4] output: => 3;
a, b = [1,0,0,3], [0,0,1,4] output => 3.
(arr1 & arr2).map { |i| [arr1.count(i), arr2.count(i)].min }.inject(0, &:+)
Here (arr1 & arr2) return list of uniq values that both arrays contain, arr.count(i) counts the number of items i in the array.
Another use for the mighty (and much needed) Array#difference, which I defined in my answer here. This method is similar to Array#-. The difference between the two methods is illustrated in the following example:
a = [1,2,3,4,3,2,4,2]
b = [2,3,4,4,4]
a - b #=> [1]
a.difference b #=> [1, 3, 2, 2]
For the present application:
def number_matches(a,b)
left_in_b = b
a.reduce(0) do |t,e|
if left_in_b.include?(e)
left_in_b = left_in_b.difference [e]
t+1
else
t
end
end
end
number_matches [0,0,5], [0,5,5] #=> 2
number_matches [1,0,0,3], [0,0,1,4] #=> 3
number_matches [1,0,0,3], [0,0,1,4] #=> 3
Using the multiset gem:
(Multiset.new(a) & Multiset.new(b)).size
Multiset is like Set, but allows duplicate values. & is the "set intersection" operator (return all things that are in both sets).
I don't think this is an ideal answer, because it's a bit complex, but...
def count(arr)
arr.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
end
def matches(a1, a2)
m = 0
a1_counts = count(a1)
a2_counts = count(a2)
a1_counts.each do |e, c|
m += [a1_counts, a2_counts].min
end
m
end
Basically, first write a method that creates a hash from an array of the number of times each element appears. Then, use those to sum up the smallest number of times each element appears in both arrays.
I'm interested in possible ways that different languages can Join an array, but rather than using a single join string, using a different join string at given intervals.
For example (hypothetical language):
Array.modJoin([mod, char], ...)
e.g. [1,2,3,4,5].modJoin( [1, ","], [2, ":"] )
Where the arguments specify an array or object containing a modulo and a join char, the implementation would check which order of modulo took precedence (the latest one), and apply the join char. (requiring that the [mod,char]'s were provided in ascending mod order)
i.e.
if (index % (nth mod) == 0)
append (join char)
continue to next index
else
(nth mod)-- repeat
when complete join with ""
For example, I've come up with the following in Ruby, but I suspect better / more elegant methods exist, and that's what I'd like to see.
#Create test Array
#9472 - 9727 as HTML Entities (unicode box chars)
range = (9472..9727).to_a.map{|u| "&##{u};" }
Assuming we have a list of mod's and join chars, we have to stipulate that mods increase in value as the list progresses.
mod_joins = [{m:1, c:",", m:12, c:"<br/>"]
Now process the range with mod_joins
processed = range.each_with_index.map { |e, i|
m = nil
mods.reverse_each {|j|
m = j and break if i % j[:m] == 0
}
# use the lowest mod for index 0
m = mods[0] if i == 0
m = nil ? e : "#{e}#{m[:c]}"
}
#output the result string
puts processed.join ""
From this we have a list of htmlEntities, separated by , unless it's index is a 12th modulo in which case it's a <br/>
So, I'm interested for ways this can be done more elegantly, primarily in functional languages like Haskell, F#, Common Lisp (Scheme, Clojure) etc. but also cool ways to achieve this in general purpose languages that have list comprehension extensions such as C# with Linq, Ruby and Python or even Perl.
Here's a pure-functional version written in Python. I'm sure it can be adapted easily enough to other languages.
import itertools
def calcsep(num, sepspecs):
'''
num: current character position
sepspecs: dict containing modulus:separator entries
'''
mods = reversed(sorted(sepspecs))
return sepspecs[next(x for x in mods if num % x == 0)]
vector = [str(x) for x in range(12)]
result = [y for ix, el in enumerate(vector)
for y in (calcsep(ix, {1:'.', 3:',', 5:';'}), el)]
print ''.join(itertools.islice(result, 1, None))
Here’s a simpler and more readable solution in Ruby
array = (9472..9727).map{|u|"&##{u};"}
array.each_slice(12).collect{|each|each.join(",")}.join("<br/>")
or for the general case
module Enumerable
def fancy_join(instructions)
return self.join if instructions.empty?
separator = instructions.delete(mod = instructions.keys.max)
self.each_slice(mod).collect{|each|each.fancy_join(instructions.dup)}.join(separator)
end
end
range = (9472..9727).map{|u|"&##{u};"}
instructions = {1=>",",12=>"<br/>"}
puts array.fancy_join(instructions)
I have an array of filenames. A subset of these may have similar pattern like this (alphabet strings with a number at the end):
arr = %w[
WordWord1.html
WordWord3.html
WordWord10.html
WordWord11.html
AnotherWord1.html
AnotherWord2.html
FileFile.html
]
How to identify the similar ones (they have identical substring, just their numbers differ) and move them to an array ?
['WordWord1.html', 'WordWord3.html', 'WordWord10.html', 'WordWord11.html']
['AnotherWord1.html', 'AnotherWord2.html']
['FileFile.html']
arr.group_by { |x| x[/[a-zA-Z]+/] }.values
filenames = ["WordWord1.html", "WordWord3.html", "WordWord10.html", "WordWord11.html", "AnotherWord1.html", "AnotherWord2.html", "FileFile.html"]
filenames.inject({}){|h,f|k = f.split(/[^a-zA-Z]/, 2).first;h[k] ||= [];h[k] << f; h}
arr = %w[
WordWord1.html
WordWord3.html
WordWord10.html
WordWord11.html
AnotherWord1.html
AnotherWord2.html
FileFile.html
]
result = {}
arr.each do |a|
prefix = a.match(/[A-Za-z]+/).to_s
if result[prefix]
result[prefix] << a
else
result[prefix] = [a]
end
end
p result