Related
I'm working to update the SVG::Graph gem, and have made many improvements to my version, but have found a bottleneck with multiple array sorting.
There is a "sort_multiple" function built in, which keeps an array of arrays (all of equal size) sorted by the first array in the group.
The issue I have is that this sort works well on truly random data, and really badly on sorted, or almost sorted data:
def sort_multiple( arrys, lo=0, hi=arrys[0].length-1 )
if lo < hi
p = partition(arrys,lo,hi)
sort_multiple(arrys, lo, p-1)
sort_multiple(arrys, p+1, hi)
end
arrys
end
def partition( arrys, lo, hi )
p = arrys[0][lo]
l = lo
z = lo+1
while z <= hi
if arrys[0][z] < p
l += 1
arrys.each { |arry| arry[z], arry[l] = arry[l], arry[z] }
end
z += 1
end
arrys.each { |arry| arry[lo], arry[l] = arry[l], arry[lo] }
l
end
this routine appears to use a variant of the Lomuto partition scheme from wikipedia: https://en.wikipedia.org/wiki/Quicksort#Lomuto_partition_scheme
I have an array of 5000+ numbers, which is previously sorted, and this function adds about 1/2 second per chart.
I have modified the "sort_multiple" routine with the following:
def sort_multiple( arrys, lo=0, hi=arrys[0].length-1 )
first = arrys.first
return arrys if first == first.sort
if lo < hi
...
which has "fixed" the problem with sorted data, but I was wondering if there is any way to utilise the better sort functions built into ruby to get this sort to work much quicker. e.g. do you think I could utilise a Tsort to speed this up? https://ruby-doc.org/stdlib-2.6.1/libdoc/tsort/rdoc/TSort.html
looking at my benchmarking, the completely random first group appears to be very fast.
Current benchmarking:
def sort_multiple( arrys, lo=0, hi=arrys[0].length-1 )
if lo < hi
p = partition(arrys,lo,hi)
sort_multiple(arrys, lo, p-1)
sort_multiple(arrys, p+1, hi)
end
arrys
end
def partition( arrys, lo, hi )
p = arrys[0][lo]
l = lo
z = lo+1
while z <= hi
if arrys[0][z] < p
l += 1
arrys.each { |arry| arry[z], arry[l] = arry[l], arry[z] }
end
z += 1
end
arrys.each { |arry| arry[lo], arry[l] = arry[l], arry[lo] }
l
end
first = (1..5400).map { rand }
second = (1..5400).map { rand }
unsorted_arrys = [first.dup, second.dup, Array.new(5400), Array.new(5400), Array.new(5400)]
sorted_arrys = [first.sort, second.dup, Array.new(5400), Array.new(5400), Array.new(5400)]
require 'benchmark'
Benchmark.bmbm do |x|
x.report("unsorted") { sort_multiple( unsorted_arrys.map(&:dup) ) }
x.report("sorted") { sort_multiple( sorted_arrys.map(&:dup) ) }
end
results:
Rehearsal --------------------------------------------
unsorted 0.070699 0.000008 0.070707 ( 0.070710)
sorted 0.731734 0.000000 0.731734 ( 0.731742)
----------------------------------- total: 0.802441sec
user system total real
unsorted 0.051636 0.000000 0.051636 ( 0.051636)
sorted 0.715730 0.000000 0.715730 ( 0.715733)
#EDIT#
Final accepted solution:
def sort( *arrys )
new_arrys = arrys.transpose.sort_by(&:first).transpose
new_arrys.each_index { |k| arrys[k].replace(new_arrys[k]) }
end
I have an array of 5000+ numbers, which is previously sorted, and this function adds about 1/2 second per chart.
Unfortunately, algorithms implemented in Ruby can become quite slow. It's often much faster to delegate the work to the built-in methods that are implemented in C, even if it comes with an overhead.
To sort a nested array, you could transpose it, then sort_by its first element, and transpose again afterwards:
arrays.transpose.sort_by(&:first).transpose
It works like this:
arrays #=> [[3, 1, 2], [:c, :a, :b]]
.transpose #=> [[3, :c], [1, :a], [2, :b]]
.sort_by(&:first) #=> [[1, :a], [2, :b], [3, :c]]
.transpose #=> [[1, 2, 3], [:a, :b, :c]]
And although it creates several temporary arrays along the way, the result seems to be an order of magnitude faster than the "unsorted" variant:
unsorted 0.035297 0.000106 0.035403 ( 0.035458)
sorted 0.474134 0.003065 0.477199 ( 0.480667)
transpose 0.001572 0.000082 0.001654 ( 0.001655)
In the long run, you could try to implement your algorithm as a C extension.
I confess I don't fully understand the question and don't have the time to study the code at the link, but it seems that you have one sorted array that you are repeatedly mutating only slightly, and with each change you may mutate several other arrays, each a little or a lot. After each set of mutations you re-sort the first array and then rearrage each of the other arrays consistent with the changes in indices of elements in the first array.
If, for example, the first array were
arr = [2,4,6,8,10]
and the change to arr were to replace the element at index 1 (4) with 9 and the element at index 3 (8) with 3, arr would become [2,9,6,3,10], which, after re-sorting, would be [2,3,6,9,10]. We could do that as follows:
new_arr, indices = [2,9,6,3,10].each_with_index.sort.transpose
#=> [[2, 3, 6, 9, 10], [0, 3, 2, 1, 4]]
Therefore,
new_arr
#=> [2, 3, 6, 9, 10]
indices
#=> [0, 3, 2, 1, 4]
the intermediate calculation being
[2,9,6,3,10].each_with_index.sort
#=> [[2, 0], [3, 3], [6, 2], [9, 1], [10, 4]]
Considering that
new_array == [2,9,6,3,10].values_at(*indices)
#=> true
we see that each of the other arrays, after having been mutated, can be sorted to conform with the sorting of indices in the first array with the following method, which is quite fast.
def sort_like_first(a, indices)
a.values_at(*indices)
end
For example,
a = [5,4,3,1,2]
a.replace(sort_like_first a, indices)
a #=> [5, 1, 3, 4, 2]
a = %w|dog cat cow pig owl|
a.replace(sort_like_first a, indices)
a #=> ["dog", "pig", "cow", "cat", "owl"]
In fact, it's not necessary to sort each of the other arrays until they are required in the calculations.
I would now like to consider a special case, namely, when only a single element in the first array is to be changed.
Suppose (as before)
arr = [2,4,6,8,10]
and the element at index 3 (8) is to be replaced with 5, resulting in [2,4,6,5,10]. A fast sort can be done with the following method, which employs a binary search.
def new_indices(arr, replace_idx, replace_val)
new_loc = arr.bsearch_index { |n| n >= replace_val } || arr.size
indices = (0..arr.size-1).to_a
index_removed = indices.delete_at(replace_idx)
new_loc -= 1 if new_loc > replace_idx
indices.insert(new_loc, index_removed)
end
arr.bsearch_index { |n| n >= replace_val } returns nil if n >= replace_val #=> false for all n. It is for that reason I have tacked on || arr.size.
See Array#bsearch_index, Array#delete_at and Array#insert.
Let's try it. If
arr = [2,4,6,8,10]
replace_idx = 3
replace_val = 5
then
indices = new_indices(arr, replace_idx, replace_val)
#=> [0, 1, 3, 2, 4]
Only now can we replace the element of arr at index replace_idx.
arr[replace_idx] = replace_val
arr
#=> [2, 4, 6, 5, 10]
We see that the re-sorted array is as follows.
arr.values_at(*indices)
#=> [2, 4, 5, 6, 10]
The other arrays are sorted as before, using sort_like_first:
a = [5,4,3,1,2]
a.replace(sort_like_first(a, indices))
#=> [5, 4, 1, 3, 2]
a = %w|dog cat cow pig owl|
a.replace(sort_like_first(a, indices))
#=> ["dog", "cat", "pig", "cow", "owl"]
Here's a second example.
arr = [2,4,6,8,10]
replace_idx = 3
replace_val = 12
indices = new_indices(arr, replace_idx, replace_val)
#=> [0, 1, 2, 4, 3]
arr[replace_idx] = replace_val
arr
#=> [2, 4, 6, 12, 10]
The first array sorted is therefore
arr.values_at(*indices)
#=> [2, 4, 6, 10, 12]
The other arrays are sorted as follows.
a = [5,4,3,1,2]
a.replace(sort_like_first a, indices)
a #=> [5, 4, 3, 2, 1]
a = %w|dog cat cow pig owl|
a.replace(sort_like_first a, indices)
a #=> ["dog", "cat", "cow", "owl", "pig"]
I'm using Ruby 1.9.3 and I want to remove values from an array that appear more than once. I have the following:
arr = [1,2,2,3,4,5,6,6,7,8,9]
and the result should be:
arr = [1,3,4,5,7,8,9].
What would be the simplest, shortest Ruby code to accomplish this?
As #Sergio Tulentsev mentioned combination of group_by and select will do the trick
Here you go
arr.group_by{|i| i}.select{|k, v| v.count.eql?(1)}.keys
We can achieve this by array select and count methods
arr.select { |x| arr.count(x) == 1 } #=> [1, 3, 4, 5, 7, 8, 9]
def find_duplicates(elements)
encountered = {}
# Examine all elements in the array.
elements.each do |e|
# If the element is in the hash, it is a duplicate.
if encountered[e]
#Remove the element
else
# Record that the element was encountered.
encountered[e] = 1
end
end
end
I want to remove values from an array that appear more than once.
to check element appear more than once use Array#count
to remove element conditionally use Array#delete_if
below is an example:
> arr.delete_if{|e| arr.count(e) > 1}
#=> [1, 3, 4, 5, 7, 8, 9]
Option2:
> arr.group_by{|e| e}.delete_if{|_,v| v.size > 1}.keys
#=> [1, 3, 4, 5, 7, 8, 9]
First of you need to group elements by itself (which will return key, value pair), then remove such elements which appear more than once(value), and use keys
I would be inclined to use a counting hash.
Code
def single_instances(arr)
arr.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }.
select { |_,v| v == 1 }.
keys
end
Example
single_instances [1,2,2,3,4,5,6,6,7,8,9]
#=> [1, 3, 4, 5, 7, 8, 9]
Explanation
The steps are as follows.
arr = [1,2,2,3,4,5,6,6,7,8,9]
f = Hash.new(0)
#=> {}
f is created with the method Hash::new with an argument of zero. That means that if f does not have a key k, f[k] returns zero (and does not alter f).
enum = arr.each_with_object(f)
#=> #<Enumerator: [1, 2, 2, 3, 4, 5, 6, 6, 7, 8, 9]:each_with_object({})>
h = enum.each { |e,h| h[e] += 1 }
#=> {1=>1, 2=>2, 3=>1, 4=>1, 5=>1, 6=>2, 7=>1, 8=>1, 9=>1}
g = h.select { |_,v| v == 1 }
#=> {1=>1, 3=>1, 4=>1, 5=>1, 7=>1, 8=>1, 9=>1}
g.keys
#=> [1, 3, 4, 5, 7, 8, 9]
In calculating g, Hash#select (which returns a hash), not Enumerable#select (which returns an array), is executed. I've used an underscore for the first block variable (a key in h) to signify that it is not used in the block calculation.
Let's look more carefully at the calculation of h. The first value is generated by the enumerator enum and passed to the block, and the block variables are assigned values using a process called disambiguation or decomposition.
e, h = enum.next
#=> [1, {}]
e #=> 1
h #=> {}
so the block calculation is
h[e] += 1
#=> h[e] = h[e] + 1 => 0 + 1 => 1
h[e] on the right side of the equality (using the method Hash#[], as contrasted with Hash#[]= on the left side of the equality), returns 1 because h has no key e #=> 1.
The next two elements of enum are passed to the block and the following calculations are performed.
e, h = enum.next
#=> [2, {1=>1}]
h[e] += 1
#=> h[e] = h[2] + 1 => 0 + 1 => 1
Notice that h has been updated.
e, h = enum.next
#=> [2, {1=>1, 2=>1}]
h[e] += 1
#=> h[e] = h[e] + 1 => h[2] + 1 => 1 + 1 => 2
h #=> {1=>1, 2=>2}
This time, because h already has a key e #=> 2, the hash's default value is not used.
The remaining calculations are similar.
Use [Array#difference] instead
A simpler way is to use the method Array#difference.
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
Suppose
arr = [1,2,2,3,4,2,5,6,6,7,8,9]
Note the addition of a third 2.
arr - arr.difference(arr.uniq)
# => [1, 3, 4, 5, 7, 8, 9]
The three steps are as follows.
a = arr.uniq
#=> [1, 2, 3, 4, 5, 6, 7, 8, 9]
b = arr.difference(a)
#=> [2, 2, 6] (elements that appear more than once)
arr - b
# => [1, 3, 4, 5, 7, 8, 9]
I've proposed that Array#diffence be added to the Ruby core, but there seems to be little interest in doing so.
I have an array with a varying number of elements 0..n elements. An example could be:
a = [0,1,2,3,4,5,6,7,8,9]
In an iterative process, I would like to move a cursor in the array and slice out a max number of elements. If I reach the "end" of the array, it should start over and pick from the beginning again:
Something like this:
4.times do |i|
a.slice(i * 3, 3)
end
# i = 0 => [0,1,2]
# i = 1 => [3,4,5]
# i = 2 => [6,7,8]
# i = 3 => [9,0,1]
# ...
However the last output i = 3 produces [9] as .slice does not do exactly what I want.
You could use cycle:
a.cycle.each_slice(3).take(4)
#=> [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 0, 1]]
You could use Array#rotate, and then take the first 3 elements each time:
4.times.each { |i| a.rotate(i*3)[0..2] }
# => [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 0, 1]]
Possible solution:
4.times { |i| p a.values_at(*(i*3..i*3+2).map {|e| e % 10 }) }
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 0, 1]
9%10 = 9, 10%10 = 0, 11%10 = 1. So you will get the desired output.
This might break some code, so be careful.
class Array
alias_method :old_slice, :slice
def slice(o, c)
ret = old_slice(o % size, c)
if ret.size != c
ret += old_slice(0, c - ret.size)
end
ret
end
end
a = [0,1,2,3,4,5,6,7,8,9]
4.times do |i|
p a.slice(i * 3, 3)
end
As Stephan points out it would be better to give this method a different name, or it might be even better to create a CircularArray class.
class CircularArray < Array
alias_method :straight_slice, :slice
def slice(o, c)
ret = straight_slice(o % size, c)
if ret.size != c
ret += straight_slice(0, c - ret.size)
end
ret
end
end
I'm using Ruby 2.4. I have an array of strings taht are all numbers. I want to count the number of elements in the array that are unique and that are also greater than the element before them (I consider the first array element already greater than its non-existent predecessor). So I tried
data_col = ["3", "6", "10"]
#=> ["3", "6", "10"]
data_col.map { |string| string.to_i.to_s == string ? string.to_i : -2 }.each_cons(2).select { |a, b| a > b && data_col.count(a) == 1 }.count
#=> 0
but the results is zero, despite the fact that all the elements in my array satisfy my criteria. How can I improve the way I count this?
require 'set'
def nbr_uniq_and_bigger(arr)
processed = Set.new
arr.each_cons(2).with_object(Set.new) do |(n,m),keepers|
if processed.add?(m)
keepers << m if m > n
else
keepers.delete(m)
end
end.size + (processed.include?(arr.first) ? 0 : 1)
end
nbr_uniq_and_bigger [1, 2, 6, 3, 2]
#=> 2
nbr_uniq_and_bigger [1, 2, 1, 2, 1, 2]
#=> 0
See Set.add?.
Note the line keepers.delete(m) could be written
keepers.delete(m) if keepers.key(m)
but attempting to delete an element not in the set does not harm.
There are a few things wrong here:
a > b seems like the opposite of what you want to test. That should probably be b > a.
If I followed properly, I think data_col.count(a) is always going to be zero, since a is an integer and data_col contains only strings. Also, I'm not sure you want to be looking for a... b is probably the right element to look for.
I'm not sure you're ever counting the first element here. (You said you consider the first element to be greater than its non-existent predecessor, but where in your code does that happen?)
Here's some code that works:
def foo(x)
([nil] + x).each_cons(2).select { |a, b| (a == nil or b > a) and x.count(b) == 1 }.count()
end
p foo([3, 6, 10]) # 3
p foo([3, 6, 10, 1, 6]) # 2
(If you have an array of strings, feel free to do .map { |s| s.to_i } first.)
One more solution:
def uniq_and_bigger(data)
counts = data.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 } #precalculate
data.each_cons(2).with_object([]) do |(n,m), a|
a << m if m > n && counts[m] == 1
end.size + (counts[data[0]] == 1 ? 1 : 0)
end
uniq_and_bigger([3, 6, 10, 1, 6])
=> 2
uniq_and_bigger([1, 2, 1, 2, 1, 2])
=> 0
Yet another solution. It's O(n) and it returns the desired result for [3, 6, 10].
It uses slice_when :
def unique_and_increasing(array)
duplicates = array.group_by(&:itself).select{ |_, v| v.size > 1 }.keys
(array.slice_when{ |a, b| a < b }.map(&:first) - duplicates).size
end
p unique_and_increasing [3, 6, 10]
# 3
p unique_and_increasing [3, 6, 10, 1, 6]
# 2
p unique_and_increasing [1, 2, 1, 2, 1, 2]
# 0
There are two arrays:
A = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
B = [3, 4, 1, 5, 2, 6]
I want to sort B in a way that for all the elements of B that exists in A, sort the elements in the order that is in array A.
The desired sorted resulted would be
B #=> [1, 2, 3, 4, 5, 6]
I have tried to do
B = B.sort_by { |x| A.index }
but it does not work.
This question differs from the possible duplicates because it deals with presence of elements in the corresponding array and no hashes are present here.
It perfectly works:
▶ A = [1,3,2,6,4,5,7,8,9,10]
▶ B = [3,4,1,5,2,6]
▶ B.sort_by &A.method(:index)
#⇒ [1, 3, 2, 6, 4, 5]
If there could be elements in B that are not present in A, use this:
▶ B.sort_by { |e| A.index(e) || Float::INFINITY }
I would start by checking what elements from B exist in A :
B & A
and then sort it:
(B & A).sort_by { |e| A.index(e) }
First consider the case where every element of B is in A, as with the question's example:
A = [1,2,3,4,5,6,7,8,9,10]
B = [3,6,1,5,1,2,1,6]
One could write the following, which requires only a single pass through A (to construct g1) and a single pass through B.
g = A.each_with_object({}) { |n,h| h[n] = 1 }
#=> {1=>1, 2=>1, 3=>1, 4=>1, 5=>1, 6=>1, 7=>1, 8=>1, 9=>1, 10=>1}
B.each_with_object(g) { |n,h| h[n] += 1 }.flat_map { |k,v| [k]*(v-1) }
#=> [1, 1, 1, 2, 3, 5, 6, 6]
If there is no guarantee that all elements of B are in A, and any that are not are to be placed at the end of the sorted array, one could change the calculation of g slightly.
g = (A + (B-A)).each_with_object({}) { |n,h| h[n] = 1 }
This requires one more pass through A and through B.
Suppose, for example,
A = [2,3,4,6,7,8,9]
and B is unchanged. Then,
g = (A + (B-A)).each_with_object({}) { |n,h| h[n] = 1 }
#=> {2=>1, 3=>1, 4=>1, 6=>1, 7=>1, 8=>1, 9=>1, 1=>1, 5=>1}
B.each_with_object(g) { |n,h| h[n] += 1 }.flat_map { |k,v| [k]*(v-1) }
#=> [2, 3, 6, 6, 1, 1, 1, 5]
This solution demonstrates the value of a controversial change to hash properties that were made in Ruby v1.9: hashes would thereafter be guaranteed to maintain key-insertion order.
1 I expect one could write g = A.product([1]).to_h, but the doc Array#to_h does not guarantee that the keys in the hash returned will have the same order as they do in A.
You just missed x in A.index, so the query should be:
B = B.sort_by { |x| A.index(x) }