Related
I'm working on a problem around Instagram hashtags. Users often have "bundles" of hashtags that they copy and paste when they are posting images. Different bundles for different topics.
So I might have my "Things from the garden" bundle, which would be ["garden", "beautifullawns", "treesoutside", "greenlondon"] and so on. They're often twenty to thirty items long.
Sometimes they might have several of these to keep things varied.
What I want to do is by looking at past images that they have posted, to recommend a bundle of tags to use.
To do that I would have several arrays of tags that they have used previously:
x = ["a", "b", "c", "d", "e"]
y = ["a", "b", "d", "e", "f", "g"]
z = ["a", "c", "d", "e", "f", "h"]
...
I'd like to find largest common subsets of entries for these arrays.
So in this case, the largest subset would be ["a", "d", "e"] within those three. That's simple enough to achieve naively by using something like x & y & z.
However, I'd like to create a ranking of these subsets based on their size and frequency within all of the arrays under consideration, so that I can display the most commonly used bundles of tags:
[
{bundle: ["a","d","e"], frequency: 3, size: 3},
{bundle: ["e","f"], frequency: 2, size: 2},
{bundle: ["a","b"], frequency: 2, size: 2},
{bundle: ["b","d"], frequency: 2, size: 2},
...
]
Presumably, with a limitation on the minimum size of these bundles, say two items.
I'm using Elasticsearch for indexing, but I've found that attempting to do this with aggregations is challenging, so I'm pulling out the images into Ruby and then working there to create the listing.
As a first pass, I've looped over all of these arrays, then find all subsets of the other arrays, using an MD5 hash key as a unique identifier. But this limits the results. Adding further passes makes this approach quite inefficient, I suspect.
require 'digest'
x = ["a", "b", "c", "d", "e"]
y = ["a", "b", "d", "e", "f", "g"]
z = ["a", "c", "d", "e", "f", "h"]
def bundle_report arrays
arrays = arrays.collect(&:sort)
working = {}
arrays.each do |array|
arrays.each do |comparison|
next if array == comparison
subset = array & comparison
key = Digest::MD5.hexdigest(subset.join(""))
working[key] ||= {subset: subset, frequency: 0}
working[key][:frequency] += 1
working[key][:size] = subset.length
end
end
working
end
puts bundle_report([x, y, z])
=> {"bb4a3fb7097e63a27a649769248433f1"=>{:subset=>["a", "b", "d", "e"], :frequency=>2, :size=>4}, "b6fdd30ed956762a88ef4f7e8dcc1cae"=>{:subset=>["a", "c", "d", "e"], :frequency=>2, :size=>4}, "ddf4a04e121344a6e7ee2acf71145a99"=>{:subset=>["a", "d", "e", "f"], :frequency=>2, :size=>4}}
Adding a second pass gets this to a better result:
def bundle_report arrays
arrays = arrays.collect(&:sort)
working = {}
arrays.each do |array|
arrays.each do |comparison|
next if array == comparison
subset = array & comparison
key = Digest::MD5.hexdigest(subset.join(""))
working[key] ||= {subset: subset, frequency: 0}
working[key][:frequency] += 1
working[key][:size] = subset.length
end
end
original_working = working.dup
original_working.each do |key, item|
original_working.each do |comparison_key, comparison|
next if item == comparison
subset = item[:subset] & comparison[:subset]
key = Digest::MD5.hexdigest(subset.join(""))
working[key] ||= {subset: subset, frequency: 0}
working[key][:frequency] += 1
working[key][:size] = subset.length
end
end
working
end
puts bundle_report([x, y, z])
=> {"bb4a3fb7097e63a27a649769248433f1"=>{:subset=>["a", "b", "d", "e"], :frequency=>2, :size=>4}, "b6fdd30ed956762a88ef4f7e8dcc1cae"=>{:subset=>["a", "c", "d", "e"], :frequency=>2, :size=>4}, "ddf4a04e121344a6e7ee2acf71145a99"=>{:subset=>["a", "d", "e", "f"], :frequency=>2, :size=>4}, "a562cfa07c2b1213b3a5c99b756fc206"=>{:subset=>["a", "d", "e"], :frequency=>6, :size=>3}}
Can you suggest an efficient way to establish this ranking of large subsets?
Rather than do an intersection of every array with every other array, which might quickly get out of hand, I'd be tempted to keep a persistent index (in Elasticsearch?) of all the possible combinations seen so far, along with a count of their frequency. Then for every new set of tags, increment the frequency counts by 1 for all the sub-combinations from that tag.
Here's a quick sketch:
require 'digest'
def bundle_report(arrays, min_size = 2, max_size = 10)
combination_index = {}
arrays.each do |array|
(min_size..[max_size,array.length].min).each do |length|
array.combination(length).each do |combination|
key = Digest::MD5.hexdigest(combination.join(''))
combination_index[key] ||= {bundle: combination, frequency: 0, size: length}
combination_index[key][:frequency] += 1
end
end
end
combination_index.to_a.sort_by {|x| [x[1][:frequency], x[1][:size]] }.reverse
end
input_arrays = [
["a", "b", "c", "d", "e"],
["a", "b", "d", "e", "f", "g"],
["a", "c", "d", "e", "f", "h"]
]
bundle_report(input_arrays)[0..5].each do |x|
puts x[1]
end
Which results in:
{:bundle=>["a", "d", "e"], :frequency=>3, :size=>3}
{:bundle=>["d", "e"], :frequency=>3, :size=>2}
{:bundle=>["a", "d"], :frequency=>3, :size=>2}
{:bundle=>["a", "e"], :frequency=>3, :size=>2}
{:bundle=>["a", "d", "e", "f"], :frequency=>2, :size=>4}
{:bundle=>["a", "b", "d", "e"], :frequency=>2, :size=>4}
This might not scale very well either though.
Ruby 2.4. I have an array of strings
2.4.0 :007 > arr = ["a", "b", "g", "e", "f", "i"]
=> ["a", "b", "g", "e", "f", "h", "i"]
How do I split my array into smaller arrays based on a condition? I have a function -- "contains_vowel," which returns true if a string contains "a", "e", "i", "o", or "u". How would I take an array of strings and split it into smaller arrays, using a divider function of "contains_vowel"? That is, for the above, the resulting array of smaller arrays would be
[["a"], ["b", "g"], ["e"], ["f", "h"], ["i"]]
If an element of the larger array satisfies the condition, it would become an array of one element.
arr = ["a", "b", "g", "e", "f", "i"]
r = /[aeiou]/
arr.slice_when { |a,b| a.match?(r) ^ b.match?(r) }.to_a
=> [["a"], ["b", "g"], ["e"], ["f"], ["i"]]
String#match? made its debut in Ruby v2.4. For earlier versions you could use (for example) !!(b =~ r), where !! converts a truthy/falsy value to true/false. That converstion is needed because the XOR operator ^ serves double-duty: it's a logical XOR when a and b in a^b are true, false or nil, and a bit-wise XOR when the operands are integers, such as 2^6 #=> 4 (2.to_s(2) #=> "10"; 6.to_s(2) #=> "110"; 4.to_s(2) #=> "100").
One more way to skin a cat
def contains_vowel(v)
v.count("aeiou") > 0
end
def split_by_substring_with_vowels(arr)
arr.chunk_while do |before,after|
!contains_vowel(before) & !contains_vowel(after)
end.to_a
end
split_by_substring_with_vowels(arr)
#=> [["a"], ["b", "g"], ["e"], ["f", "h"], ["i"]]
What it does:
passes each consecutive 2 elements
splits when either of them contain vowels
Example with your other Array
arr = ["1)", "dwr", "lyn,", "18,", "bbe"]
split_by_substring_with_vowels(arr)
#=> [["1)", "dwr", "lyn,", "18,"], ["bbe"]]
Further example: (if you want vowel containing elements in succession to stay in the same group)
def split_by_substring_with_vowels(arr)
arr.chunk_while do |before,after|
v_before,v_after = contains_vowel(before),contains_vowel(after)
(!v_before & !v_after) ^ (v_before & v_after)
end.to_a
end
arr = ["1)", "dwr", "lyn,", "18,", "bbe", "re", "rr", "aa", "ee"]
split_by_substring_with_vowels(arr)
#=> [["1)", "dwr", "lyn,", "18,"], ["bbe", "re"], ["rr"], ["aa", "ee"]]
This checks if before and after are both not vowels Or if they both are vowels
I might use chunk which splits an array everytime the value of its block changes. Chunk returns a list of [block_value, [elements]] pairs, I used .map(&:last) to only get the sub-lists of elements.
arr = ["a", "b", "g", "e", "f", "h", "i"]
def vowel?(x); %w(a e i o u).include?(x); end
arr.chunk{|x| vowel?(x)}.map(&:last)
=> [["a"], ["b", "g"], ["e"], ["f", "h"], ["i"]]
contains_vowel = ->(str) { !(str.split('') & %w|a e i o u|).empty? }
_, result = ["a", "b", "g", "e", "f", "h", "i"].
each_with_object([false, []]) do |e, acc|
cv, acc[0] = acc[0], contains_vowel.(e)
cv ^ acc.first ? acc.last << [e] : (acc.last[-1] ||= []) << e
end
result
#⇒ [["a"], ["b", "g"], ["e"], ["f", "h"], ["i"]]
What we do here:
contains_vowel is a lambda to check whether the string contains a vowel or not.
we reduce the input array, collecting the last value (contained the previously handled string the vowel or not,) and the result.
cv ^ acc.first checks whether it was a flip-flop of vowel on the last step.
whether is was, we append a new array to the result
whether is was not, we append the string to the last array in the result.
I'm using Ruby 2.4. If I have a group of array indexes, and I want to delete all the elements at those indexes, how do I do that? I tried the below, but its leaving stuff out
2.4.0 :005 > indexes_to_delete = [7, 8, 9]
=> [7, 8, 9]
2.4.0 :008 > a = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]
=> ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]
2.4.0 :009 > indexes_to_delete.each do |index| a.delete_at(index) end
=> [7, 8, 9]
2.4.0 :010 > a
=> ["a", "b", "c", "d", "e", "f", "g", "i"]
Notice the ending array. I should have only seven elements in teh array since I started with ten and then specified three indexes of elements to delete in the array. Yet I have eight elements. How do I adjust my statement to delete the elements at the indexes I specified?
Every time you delete an item from an array the indexes change.
So you could do this:
3.times { a.delete_at(7) }
Which has the same effect of deleting at 7,8,9
Or use slice! as recommended here: How to delete a range of values from an array?
a.slice!(7..9)
to work with arbitrary arrays, i think the obvious choice would be reject with index:
a.reject.with_index { |item, idx| indexes_to_delete.include? idx }
This is non-mutating so you'd set a variable equal to the result.
Here are a couple of ways to do that.
indexes_to_delete = [3, 8, 9]
a = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]
#1
indexes_to_delete.sort.reverse_each { |i| a.delete_at(i) }
a #=> ["a", "b", "c", "e", "f", "g", "h"]
This mutates a. If that's not desired, operate on a copy of a (a.dup).
Just remember that you must delete the elements in the reverse order of their indexes.
#2
a.values_at(*(0..a.size-1).to_a - indexes_to_delete)
#=> ["a", "b", "c", "e", "f", "g", "h"]
First we calculate the "keepers"
(0..a.size-1).to_a - indexes_to_delete
#=> [0, 1, 2, 4, 5, 6]
This does not mutate a. If a is to be mutated write
a.replace(a.values_at(*(0..a.size-1).to_a - indexes_to_delete))
#=> ["a", "b", "c", "e", "f", "g", "h"]
a #=> ["a", "b", "c", "e", "f", "g", "h"]
#maxple gives a third way (reject.with_index), which reads best to me. I doubt there are significant efficiency differences among the three.
Your problem is that the indexing changes when you do the delete. So don't delete at first.
a = ["a","b","c","d","e","f","g","h","i","j"]
to_delete = [7,8,9]
to_delete.each { |i| a[i] = nil }
a.compact!
Set the elements you want to delete to nil, and then compact the array to get rid of those elements.
If you have nil elements you want to keep, but you have a value that you know would never have in the array, you could do
MARK = special value
to_delete.each { |i| a[i] = MARK }
a.delete(MARK)
Suppose there is an array like this one:
list = ["a", "a", "a", "b", "b", "c", "d", "e", "e"]
We want to create a cycle where every next element is different from the previous element and the first element is different from the last element.
required = ["a", "b", "a", "b", "a", "c", "e", "d", "e"]
How is this done in ruby?
def create_cycle
temp = Array.new($input)
i, j, counter = 0
while i == 0
while (counter != $input.length)
j = rand(1..$input.length-1).floor
unless !($input[i][0].to_s.eql?$input[j][0])
$solution.push($input[i])
$solution.push($input[j])
puts input[i], input[j]
$input.delete_at(i)
$input.delete_at(j)
counter = counter + 1
end
end
end
end
I'm trying to learn this. Thank you for your help.
Additional notes:
The elements a, b, c, d, e represent special format strings, where
a certain property is common among them, thus the first element "a"
shares a property with the next element "a" but is not equivalent to
the first.
In the case it isn't possible to create a cycle, then, it is enough to raise a flag in command line.
I might do it like this:
>> list = [a, a, a, b, b, c, d, e, e]
>> list.sort.each_slice((list.size/2.0).round).reduce(:zip).flatten.compact
=> [a, c, a, d, a, e, b, e, b]
The general method is to:
sort the list, so all identical members are adjacent
divide the list in half from the middle
interleave the two halves together
Assuming you do not care about the order being the same as in the original array, and it is ok to have duplicates if there is no way, and also assuming the list is presorted, here is one approach - it just keeps adding elements from the beginning and end of the list till there are no elements left:
def interleaver list
result = []
el = list.first
while(el)
el = list.shift
if el
result << el
else
return result
end
el = list.pop
if el
result << el
else
return result
end
end
result
end
> a = 'a'
> b = 'b'
> c = 'c'
> d = 'd'
> e = 'e'
> list = [a, a, a, b, b, c, d, e, e]
> interleaver(list)
=> ["a", "e", "a", "e", "a", "d", "b", "c", "b"]
But if such interleaving is not possible, you will get duplicates:
> list = [a, a, a, b]
> interleaver(list)
#=> ["a","b","a","a"]
You can obtain such a string, or demonstrate that no such string exists, with the following recursive method.
Code
def doit(remaining, partial=[])
first_partial, last_partial = partial.first, partial.last
if remaining.size == 1
return ([first_partial, last_partial] & remaining).empty? ?
partial + remaining : nil
end
remaining.uniq.each_with_index do |s,i|
next if s == last_partial
rem = remaining.dup
rem.delete_at(i)
rv = doit(rem, partial + [s])
return rv if rv
end
nil
end
Examples
list = %w| a a b |
#=> ["a", "a", "b"]
doit list
#=> nil
The above demonstrates that the three elements of list cannot be permuted to satisfy the two ordering requirements.
list = %w| a a a b b c d e e |
#=> ["a", "a", "a", "b", "b", "c", "d", "e", "e"]
doit list
#=> ["a", "b", "a", "b", "c", "b", "e", "d", "e"]
This took 0.0042 second to solve on a newish MacBook Pro.
list = %w| a a a a a a a b b c d e e f f f g g g g h i i i i j j |
#=> ["a", "a", "a", "a", "a", "a", "a", "b", "b", "c", "d", "e", "e",
# "f", "f", "f", "g", "g", "g", "g", "h", "i", "i", "i", "i", "j", "j"]
doit list
#=> ["a", "b", "a", "b", "a", "b", "a", "b", "c", "b", "d", "e", "f",
# "e", "f", "g", "f", "g", "h", "g", "h", "i", "j", "i", "j", "i", "j"]
This took 0.0059 seconds to solve.
Out of curiosity, I then tried
list = (%w| a a a a a a a b b c d e e f f f g g g g h i i i i j j |).shuffle
#=> ["a", "c", "f", "b", "d", "i", "a", "a", "i", "a", "a", "g", "g",
# "a", "g", "i", "j", "b", "h", "j", "e", "e", "a", "g", "f", "i", "f"]
doit list
#=> ["a", "c", "f", "b", "d", "i", "a", "i", "a", "g", "a", "g", "a",
# "g", "i", "g", "j", "b", "h", "j", "e", "a", "e", "g", "f", "i", "f"]
This took a whooping 1.16 seconds to solve, suggesting that it may be desirable to pre-sort list (doit(list.sort)) if, of course, list is sortable.
I think its' silly question lol
I have below array
[['a','b','c'],['d','e','f']]
and want that array to be
['a','b','c'],['d','e','f']
which means i want to remove the first bracket.
Does that make sense?
Thanks in adv.
no, this doesn't make sense really, because ['a','b','c'],['d','e','f'] in this notation are two separate objects/arrays not inside any other data structure...
you could do an assignment, like :
a,b = [['a','b','c'],['d','e','f']]
and then
> a
=> ["a", "b", "c"]
> b
=> ["d", "e", "f"]
or better just iterate over the outer array (because you don't know how many elements it has):
input = [['a','b','c'],['d','e','f']]
input.each do |x|
puts "element #{x.inspect}"
end
=>
element ["a", "b", "c"]
element ["d", "e", "f"]
It doesn’t make sense. Do you mean a string manipulation?
irb(main):001:0> s = "[['a','b','c'],['d','e','f']]"
=> "[['a','b','c'],['d','e','f']]"
irb(main):002:0> s[1...-1]
=> "['a','b','c'],['d','e','f']"
Or, do you want to flatten an array?
irb(main):003:0> [['a','b','c'],['d','e','f']].flatten
=> ["a", "b", "c", "d", "e", "f"]