Ruby Array: opposite of `&` for an array - arrays

In ruby you can intersect two arrays using the & operator.
I'm trying to obtain the remainder of the intersection.
If I use a simple case - is sufficient:
array_1 = [0, 1]
array_2 = [0]
array_1 - array_2 => [1]
Now imagine we have 0 appearing multiple times in the first array
array_1 = [0, 0, 1]
array_2 = [0]
array_1 - array_2 => [1]
I would like to know the easiest way to obtain the difference between the first array and the intersection of the first array and the second array
array_1 = [0, 0, 1]
array_2 = [0]
array_1 ??? array_2 => [0, 1]

I have proposed the method I think you want be added to the Ruby core. See the link for examples of its use.
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
a = [1,2,3,4,3,2,2,4]
b = [2,3,4,4,4]
a.difference b
#=> [1, 3, 2, 2]

Related

better multiple array sort, based on first array

I'm working to update the SVG::Graph gem, and have made many improvements to my version, but have found a bottleneck with multiple array sorting.
There is a "sort_multiple" function built in, which keeps an array of arrays (all of equal size) sorted by the first array in the group.
The issue I have is that this sort works well on truly random data, and really badly on sorted, or almost sorted data:
def sort_multiple( arrys, lo=0, hi=arrys[0].length-1 )
if lo < hi
p = partition(arrys,lo,hi)
sort_multiple(arrys, lo, p-1)
sort_multiple(arrys, p+1, hi)
end
arrys
end
def partition( arrys, lo, hi )
p = arrys[0][lo]
l = lo
z = lo+1
while z <= hi
if arrys[0][z] < p
l += 1
arrys.each { |arry| arry[z], arry[l] = arry[l], arry[z] }
end
z += 1
end
arrys.each { |arry| arry[lo], arry[l] = arry[l], arry[lo] }
l
end
this routine appears to use a variant of the Lomuto partition scheme from wikipedia: https://en.wikipedia.org/wiki/Quicksort#Lomuto_partition_scheme
I have an array of 5000+ numbers, which is previously sorted, and this function adds about 1/2 second per chart.
I have modified the "sort_multiple" routine with the following:
def sort_multiple( arrys, lo=0, hi=arrys[0].length-1 )
first = arrys.first
return arrys if first == first.sort
if lo < hi
...
which has "fixed" the problem with sorted data, but I was wondering if there is any way to utilise the better sort functions built into ruby to get this sort to work much quicker. e.g. do you think I could utilise a Tsort to speed this up? https://ruby-doc.org/stdlib-2.6.1/libdoc/tsort/rdoc/TSort.html
looking at my benchmarking, the completely random first group appears to be very fast.
Current benchmarking:
def sort_multiple( arrys, lo=0, hi=arrys[0].length-1 )
if lo < hi
p = partition(arrys,lo,hi)
sort_multiple(arrys, lo, p-1)
sort_multiple(arrys, p+1, hi)
end
arrys
end
def partition( arrys, lo, hi )
p = arrys[0][lo]
l = lo
z = lo+1
while z <= hi
if arrys[0][z] < p
l += 1
arrys.each { |arry| arry[z], arry[l] = arry[l], arry[z] }
end
z += 1
end
arrys.each { |arry| arry[lo], arry[l] = arry[l], arry[lo] }
l
end
first = (1..5400).map { rand }
second = (1..5400).map { rand }
unsorted_arrys = [first.dup, second.dup, Array.new(5400), Array.new(5400), Array.new(5400)]
sorted_arrys = [first.sort, second.dup, Array.new(5400), Array.new(5400), Array.new(5400)]
require 'benchmark'
Benchmark.bmbm do |x|
x.report("unsorted") { sort_multiple( unsorted_arrys.map(&:dup) ) }
x.report("sorted") { sort_multiple( sorted_arrys.map(&:dup) ) }
end
results:
Rehearsal --------------------------------------------
unsorted 0.070699 0.000008 0.070707 ( 0.070710)
sorted 0.731734 0.000000 0.731734 ( 0.731742)
----------------------------------- total: 0.802441sec
user system total real
unsorted 0.051636 0.000000 0.051636 ( 0.051636)
sorted 0.715730 0.000000 0.715730 ( 0.715733)
#EDIT#
Final accepted solution:
def sort( *arrys )
new_arrys = arrys.transpose.sort_by(&:first).transpose
new_arrys.each_index { |k| arrys[k].replace(new_arrys[k]) }
end
I have an array of 5000+ numbers, which is previously sorted, and this function adds about 1/2 second per chart.
Unfortunately, algorithms implemented in Ruby can become quite slow. It's often much faster to delegate the work to the built-in methods that are implemented in C, even if it comes with an overhead.
To sort a nested array, you could transpose it, then sort_by its first element, and transpose again afterwards:
arrays.transpose.sort_by(&:first).transpose
It works like this:
arrays #=> [[3, 1, 2], [:c, :a, :b]]
.transpose #=> [[3, :c], [1, :a], [2, :b]]
.sort_by(&:first) #=> [[1, :a], [2, :b], [3, :c]]
.transpose #=> [[1, 2, 3], [:a, :b, :c]]
And although it creates several temporary arrays along the way, the result seems to be an order of magnitude faster than the "unsorted" variant:
unsorted 0.035297 0.000106 0.035403 ( 0.035458)
sorted 0.474134 0.003065 0.477199 ( 0.480667)
transpose 0.001572 0.000082 0.001654 ( 0.001655)
In the long run, you could try to implement your algorithm as a C extension.
I confess I don't fully understand the question and don't have the time to study the code at the link, but it seems that you have one sorted array that you are repeatedly mutating only slightly, and with each change you may mutate several other arrays, each a little or a lot. After each set of mutations you re-sort the first array and then rearrage each of the other arrays consistent with the changes in indices of elements in the first array.
If, for example, the first array were
arr = [2,4,6,8,10]
and the change to arr were to replace the element at index 1 (4) with 9 and the element at index 3 (8) with 3, arr would become [2,9,6,3,10], which, after re-sorting, would be [2,3,6,9,10]. We could do that as follows:
new_arr, indices = [2,9,6,3,10].each_with_index.sort.transpose
#=> [[2, 3, 6, 9, 10], [0, 3, 2, 1, 4]]
Therefore,
new_arr
#=> [2, 3, 6, 9, 10]
indices
#=> [0, 3, 2, 1, 4]
the intermediate calculation being
[2,9,6,3,10].each_with_index.sort
#=> [[2, 0], [3, 3], [6, 2], [9, 1], [10, 4]]
Considering that
new_array == [2,9,6,3,10].values_at(*indices)
#=> true
we see that each of the other arrays, after having been mutated, can be sorted to conform with the sorting of indices in the first array with the following method, which is quite fast.
def sort_like_first(a, indices)
a.values_at(*indices)
end
For example,
a = [5,4,3,1,2]
a.replace(sort_like_first a, indices)
a #=> [5, 1, 3, 4, 2]
a = %w|dog cat cow pig owl|
a.replace(sort_like_first a, indices)
a #=> ["dog", "pig", "cow", "cat", "owl"]
In fact, it's not necessary to sort each of the other arrays until they are required in the calculations.
I would now like to consider a special case, namely, when only a single element in the first array is to be changed.
Suppose (as before)
arr = [2,4,6,8,10]
and the element at index 3 (8) is to be replaced with 5, resulting in [2,4,6,5,10]. A fast sort can be done with the following method, which employs a binary search.
def new_indices(arr, replace_idx, replace_val)
new_loc = arr.bsearch_index { |n| n >= replace_val } || arr.size
indices = (0..arr.size-1).to_a
index_removed = indices.delete_at(replace_idx)
new_loc -= 1 if new_loc > replace_idx
indices.insert(new_loc, index_removed)
end
arr.bsearch_index { |n| n >= replace_val } returns nil if n >= replace_val #=> false for all n. It is for that reason I have tacked on || arr.size.
See Array#bsearch_index, Array#delete_at and Array#insert.
Let's try it. If
arr = [2,4,6,8,10]
replace_idx = 3
replace_val = 5
then
indices = new_indices(arr, replace_idx, replace_val)
#=> [0, 1, 3, 2, 4]
Only now can we replace the element of arr at index replace_idx.
arr[replace_idx] = replace_val
arr
#=> [2, 4, 6, 5, 10]
We see that the re-sorted array is as follows.
arr.values_at(*indices)
#=> [2, 4, 5, 6, 10]
The other arrays are sorted as before, using sort_like_first:
a = [5,4,3,1,2]
a.replace(sort_like_first(a, indices))
#=> [5, 4, 1, 3, 2]
a = %w|dog cat cow pig owl|
a.replace(sort_like_first(a, indices))
#=> ["dog", "cat", "pig", "cow", "owl"]
Here's a second example.
arr = [2,4,6,8,10]
replace_idx = 3
replace_val = 12
indices = new_indices(arr, replace_idx, replace_val)
#=> [0, 1, 2, 4, 3]
arr[replace_idx] = replace_val
arr
#=> [2, 4, 6, 12, 10]
The first array sorted is therefore
arr.values_at(*indices)
#=> [2, 4, 6, 10, 12]
The other arrays are sorted as follows.
a = [5,4,3,1,2]
a.replace(sort_like_first a, indices)
a #=> [5, 4, 3, 2, 1]
a = %w|dog cat cow pig owl|
a.replace(sort_like_first a, indices)
a #=> ["dog", "cat", "cow", "owl", "pig"]

Numpy: Comparing array elements within another array

I have a numpy array
X = [[1,2], [3,4], [5,6], [1,2], [5,6]]
I want a numpy array Y = [1, 2, 3, 1, 3], where [1,2] is replaced by 1, [3,4] replaced by 2 and so on. This is for a very large (think millions) X.
Intuition is Y[X == [1,2]] = 1. But this does't work.
Intuition is Y[X == [1,2]] = 1. But this does't work.
Here is how to make it work:
Y = np.empty(len(X), dtype=np.int)
Y[np.all(X == [1, 2], 1)] = 1
To process all the possible values:
s = set(map(tuple, X))
r = np.arange(1, len(s) + 1) # or assign whatever values you want
cond = [np.all(X == v, 1) for v in s]
Y = np.dot(r, cond)

How do I count the number of elements in my array that are unique and are bigger than the element before them?

I'm using Ruby 2.4. I have an array of strings taht are all numbers. I want to count the number of elements in the array that are unique and that are also greater than the element before them (I consider the first array element already greater than its non-existent predecessor). So I tried
data_col = ["3", "6", "10"]
#=> ["3", "6", "10"]
data_col.map { |string| string.to_i.to_s == string ? string.to_i : -2 }.each_cons(2).select { |a, b| a > b && data_col.count(a) == 1 }.count
#=> 0
but the results is zero, despite the fact that all the elements in my array satisfy my criteria. How can I improve the way I count this?
require 'set'
def nbr_uniq_and_bigger(arr)
processed = Set.new
arr.each_cons(2).with_object(Set.new) do |(n,m),keepers|
if processed.add?(m)
keepers << m if m > n
else
keepers.delete(m)
end
end.size + (processed.include?(arr.first) ? 0 : 1)
end
nbr_uniq_and_bigger [1, 2, 6, 3, 2]
#=> 2
nbr_uniq_and_bigger [1, 2, 1, 2, 1, 2]
#=> 0
See Set.add?.
Note the line keepers.delete(m) could be written
keepers.delete(m) if keepers.key(m)
but attempting to delete an element not in the set does not harm.
There are a few things wrong here:
a > b seems like the opposite of what you want to test. That should probably be b > a.
If I followed properly, I think data_col.count(a) is always going to be zero, since a is an integer and data_col contains only strings. Also, I'm not sure you want to be looking for a... b is probably the right element to look for.
I'm not sure you're ever counting the first element here. (You said you consider the first element to be greater than its non-existent predecessor, but where in your code does that happen?)
Here's some code that works:
def foo(x)
([nil] + x).each_cons(2).select { |a, b| (a == nil or b > a) and x.count(b) == 1 }.count()
end
p foo([3, 6, 10]) # 3
p foo([3, 6, 10, 1, 6]) # 2
(If you have an array of strings, feel free to do .map { |s| s.to_i } first.)
One more solution:
def uniq_and_bigger(data)
counts = data.each_with_object(Hash.new(0)) { |e, h| h[e] += 1 } #precalculate
data.each_cons(2).with_object([]) do |(n,m), a|
a << m if m > n && counts[m] == 1
end.size + (counts[data[0]] == 1 ? 1 : 0)
end
uniq_and_bigger([3, 6, 10, 1, 6])
=> 2
uniq_and_bigger([1, 2, 1, 2, 1, 2])
=> 0
Yet another solution. It's O(n) and it returns the desired result for [3, 6, 10].
It uses slice_when :
def unique_and_increasing(array)
duplicates = array.group_by(&:itself).select{ |_, v| v.size > 1 }.keys
(array.slice_when{ |a, b| a < b }.map(&:first) - duplicates).size
end
p unique_and_increasing [3, 6, 10]
# 3
p unique_and_increasing [3, 6, 10, 1, 6]
# 2
p unique_and_increasing [1, 2, 1, 2, 1, 2]
# 0

Subtract arrays with frequency [duplicate]

This question already has answers here:
Ruby array subtraction without removing items more than once
(4 answers)
Closed 6 years ago.
I'm trying to subtract an array from another array taking frequency into account, like this:
[1,2,2,2] some_code [1,2] # => [2,2]
What's the easiest way to accomplish this?
Using - removes all occurrences of the elements in the second array:
[1,2,2,2] - [1,2] # => []
a1 = [1,2,2,2]
a2 = [1,2]
a2.each { |e| (idx = a1.find_index e) && (a1.delete_at idx) }
#⇒ [2, 2]
Here we iterate the second array and delete elements from the first one, once per iteration, if those were found.
The first found element will be deleted.
a = [1, 2, 2, 2]
b = [1, 2]
ha = a.each_with_object(Hash.new(0)){|e, h| h[e] += 1}
# => {1=>1, 2=>3}
hb = b.each_with_object(Hash.new(0)){|e, h| h[e] += 1}
# => {1=>1, 2=>1}
(ha.keys | hb.keys).flat_map{|k| Array.new([ha[k] - hb[k], 0].max, k)}
# => [2, 2]
If I understood the problem correctly, that you wish to delete single occurrence of each element of array b from array a, here is one way to do this:
a.keep_if {|i| !b.delete(i)}
#=> [2,2]
PS: Both arrays a and b are mutated by above code, so you may want to use dup to create a copy if you want to retain original arrays.
def subtract arr_a, arr_b
arr_b.each do |b|
idx = arr_a.index(b)
arr_a.delete_at(idx) unless idx.nil?
end
end
Output:
a = [1,2,2,2]
b = [1,2]
subtract a, b
puts "a: #{a}"
# => a: [2, 2]

Ruby - print result of inject sum on array

Part of the code below sums the elements of an array. How can I print the resulting sum of the array?
#!/usr/bin/ruby
a = [ 1, 2, 3, 4]
b = a.map { |x| x*x }
c = a.select { |x| x%2== 0 }
puts a.inject do | sum,x |
sum + x
end
puts a.inspect
puts b.inspect
puts c.inspect
The sum of a can be printed by wrapping the whole inject block in parenthesis, making the resulting output the argument of puts:
a = [1, 2, 3, 4]
puts (
a.inject do |sum, x|
sum + x
end
)
# => 10
The above can be cleaned up a bit by assigning the sum of the array to a more descriptive variable, and/or by using the shorter inject syntax for summing. Your code could then look something like this:
a = [1, 2, 3, 4]
b = a.map { |x| x * x }
c = a.select { |x| x % 2 == 0 }
sum_a = a.inject(:+)
puts a.inspect
puts b.inspect
puts c.inspect
puts sum_a
# => [1, 2, 3, 4]
# => [1, 4, 9, 16]
# => [2, 4]
# => 10
Hope it helps!
Update:
As Cary pointed out in the comment below, additional improvements include condensing the c variable assignment to use a.select(&:even?) for filtering out integers divisible by 2, and using p [variable] instead of puts [variable].inspect.

Resources