Splitting a range into X groups - arrays

I need to split a range into X number of groups and I am having difficulties finding a way without using arrays since these ranges can be very large.
My current solution is to create an array out of the range and then call each_slice on it with some math to split the data into X number of groups more or less the same size depending on how many groups there are.
irb(main):026:0> a = (0..10)
=> 0..10
irb(main):027:0> a.each_slice( (a.size/3.0).round ).to_a
=> [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10]]
irb(main):028:0> a.each_slice( (a.size/5.0).round ).to_a
=> [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10]]
The problem with this is that when a range is excessively large the application will hang because of the computation it takes to split the array.
All I really need is an array in this format (taking the a.size/3.0 3 group example into account):
[0..3, 4..7, 8..10]
So I may iterate the array to pass them to the set_range method in the Net::HTTP library.
The ranges I am dealing with are as large or larger than 0..46000000 since I am dealing with file sizes in bytes.
Any help would be appreciated.

Like this?
def split_ranges(amount, max)
(0...amount).collect{|i| (i * max / amount)...((i+1) * max / amount)}
end
p split_ranges(3, 46000000)
Output:
[0...15333333, 15333333...30666666, 30666666...46000000]
Edit: (OP request)
def split_ranges(amount, max)
(0...amount).collect{|i| (i * (max + 1) / amount)..((i + 1) * (max + 1) / amount - 1)}
end
p split_ranges(3, 46000000)
Output:
[0..15333332, 15333333..30666666, 30666667..46000000]

class Range
def each_subrange(n)
return to_enum(:each_subrange, n) unless block_given?
range_size = size
range_begin = self.begin
n.times do |i|
yield range_begin + range_size * i / n .. range_begin + range_size * (i + 1) / n - 1
end
end
end
a = 0..46000000
# without a block
puts a.each_subrange(3).to_a
# with a block
a.each_subrange(3) do |r|
puts r
end

This will ensure that no range differs in size by more than one from the size of any other range:
def split_it(r, n)
return [r] if n == 1
last = r.first - 1 + (r.last-r.first+1)/n
[r.first..last].concat(split_it(last+1..r.last, n-1))
end
r = 0..46000000
split_it(r, 3)
#=> [0..15333332, 15333333..30666666, 30666667..46000000]
split_it(r, 3).map(&:size)
#=> [15333333, 15333334, 15333334]
split_it(r, 4)
#=> [0..11499999, 11500000..22999999, 23000000..34499999,
# 34500000..46000000]
split_it(r, 4).map(&:size)
# => [11500000, 11500000, 11500000, 11500001]
split_it(r, 5)
#=> [0..9199999, 9200000..18399999, 18400000..27599999,
# 27600000..36799999, 36800000..46000000]
split_it(r, 5).map(&:size)
#=> [9200000, 9200000, 9200000, 9200000, 9200001]

Related

sets of numpy 2D array rows having unique values in each row position

Consider array a, holding some of the permutations of 1,2,3,4. (my actual arrays may be larger)
import numpy as np
n = 3
a = np.array([[1, 2, 3, 4],
[1, 2, 4, 3],
[1, 3, 4, 2],
[1, 4, 3, 2],
[2, 3, 4, 1],
[2, 4, 3, 1],
[3, 1, 4, 2],
[4, 1, 2, 3]]))
I want to identify sets of n rows (in this example, n=3) where each row position holds unique values.
In this example, the output would be:
out = [[0, 4, 7],
[2, 5, 7],
[3, 4, 7]]
The 1st row of out indicates that a[0], a[4], and a[7] have unique values in each row position
When n = 2, there are 11 row pairs that match the criteria: [[0,4], [0,6], [0,7], [1,5] ...etc
When n = 4, there are 0 rows that match the criteria.
I'm new enough to python that I can't find a good way to approach this situation.
Solving this problem efficiently is far from not easy. Indeed, the brute-force solution consisting is using n nested loop is very inefficient: its complexity is O(c r! / (r-n)!) where r is the number of rows of a and c is the number of columns of a (note that ! is the factorial). Since r is a number of permutation which already grow experientially with the number of unique items in a, this means the complexity of this solution is really bad.
A more efficient solution (but still not great) is to pick a row, filter the other rows that can match with it (ie. there is no items at the same position that are equal), and then recursively do the same thing n times (the picked row are only the one that are filtered). The several sets of row indices can be appended in a list during the recursion. It is hard to evaluate the complexity of this solution, but it is far much faster in practice since most rows hopefully does not match together and the filtered rows tends to decrease exponentially too. That being said, the complexity is certainly still exponential since the size of the output appears to grow exponentially too and the output needs to be written.
Here is the implementation:
def recursiveFindSets(a, i, n, availableRows, rowIndices, results):
if availableRows.size == 0:
return
for k in availableRows:
# Save the current choice
rowIndices[i] = k
# The next selected rows needs to be bigger than `k` so to prevent replicates
newAvailableRows = availableRows[availableRows > k]
# If there is no solutions with a[k], then choose another
if newAvailableRows.size == 0:
continue
# Find rows that contains different items of a[i]
goodMatches = np.all(a[newAvailableRows] != a[k], axis=1)
# Find the location relative to `a` and not `a[availableRows]`
newAvailableRows = newAvailableRows[goodMatches]
# If there is no solutions with a[k], then choose another
if newAvailableRows.size == 0:
continue
if i == n-2:
# Generate some solutions from `newAvailableRows`
for k2 in newAvailableRows:
rowIndices[i+1] = k2
results.append(rowIndices.copy())
elif i < n-2:
recursiveFindSets(a, i+1, n, newAvailableRows, rowIndices, results)
def findSets(a, n):
availableRows = np.arange(a.shape[0], dtype=int) # Filter
rowIndices = np.empty(n, dtype=np.int_) # Current set of row indices
results = [] # List of all the sets
recursiveFindSets(a, 0, n, availableRows, rowIndices, results)
if len(results) == 0:
return np.empty((0, n), dtype=int)
return np.vstack(results)
findSets(a, 3)
# Output:
# array([[0, 4, 7],
# [2, 5, 7],
# [3, 4, 7]])
You can reduce this problem to finding all cliques of size n in an undirected graph. Nodes in the graph are given by row indices of a. There is an edge between i and j if (a[i] != a[j]).all().
Here is one implementation based on networkx. A function enumerate_all_cliques(g) iterates over cliques in g in order of increasing size. We discard all cliques of size less than n, keep those of size n, and stop once the first clique of size greater than n is found or cliques run out.
from itertools import combinations
import networkx as nx
def f(arr, n):
nodes = np.arange(arr.shape[0])
g = nx.Graph()
g.add_nodes_from(nodes)
for i, j in combinations(nodes, 2):
if (arr[i] != arr[j]).all():
g.add_edge(i, j)
for c in nx.algorithms.clique.enumerate_all_cliques(g):
lc = len(c)
if lc == n:
yield c
elif lc > n:
break
print(list(f(a, 3)))
# [[0, 4, 7], [2, 5, 7], [3, 4, 7]]
Here is another approach: find all maximal cliques and yield all subsets of size n from each clique. This can lead to double-counting, hence set is used before the return statement.
def f(arr, n):
nodes = np.arange(arr.shape[0])
g = nx.Graph()
g.add_nodes_from(nodes)
for i, j in combinations(nodes, 2):
if (arr[i] != arr[j]).all():
g.add_edge(i, j)
cliques = set()
# iterate over maximal cliques
for c in nx.algorithms.clique.find_cliques(g):
# update the clique set subsets of c
cliques.update(map(frozenset, combinations(c, n)))
# return all cliques of size n without doublecounting
return [list(c) for c in cliques]
print(f(a, 3))
# [[2, 5, 7], [3, 4, 7], [0, 4, 7]]
The performance of either approach will vary depending on input values.

circularArrayRotation algorithm ruby

I am using hacker rank and I do not understand why my ruby code only works for one test case out of like 20. Here is the question:
John Watson knows of an operation called a right circular rotation on
an array of integers. One rotation operation moves the last array
element to the first position and shifts all remaining elements right
one. To test Sherlock's abilities, Watson provides Sherlock with an
array of integers. Sherlock is to perform the rotation operation a
number of times then determine the value of the element at a given
position.
For each array, perform a number of right circular rotations and
return the values of the elements at the given indices.
Function Description
Complete the circularArrayRotation function in the editor below.
circularArrayRotation has the following parameter(s):
int a[n]: the array to rotate
int k: the rotation count
int queries[1]: the indices to report
Returns
int[q]: the values in the rotated a as requested in m
Input Format
The first line contains 3 space-separated integers, n, k, and q, the number of elements in the integer array, the rotation count and the number of queries. The second line contains n space-separated integers,
where each integer i describes array element a[i] (where 0 <= i < n). Each of the q subsequent lines contains a single integer, queries[i], an index of an element
in a to return.
Constraints
Sample Input 0
3 2 3
1 2 3
0
1
2
Sample Output 0
2
3
1
Here is my code :
def circularArrayRotation(a, k, queries)
q = []
while k >= 1
m = a.pop()
a.unshift m
k = k - 1
end
for i in queries do
v = a[queries[i]]
q.push v
end
return q
end
It only works for the sample text case but I can't figure out why. Thanks for any help you can provide.
Haven't ran any benchmarks, but this seems like a job for the aptly named Array.rotate() method:
def index_at_rotation (array, num_rotations, queries)
array = array.rotate(-num_rotations)
queries.map {|q| array[q]}
end
a = [1, 2, 3]
k = 2
q = [0,1, 2]
index_at_rotation(a, k, q)
#=> [2, 3, 1]
Handles negative rotation values and nil results as well:
a = [1, 6, 9, 11]
k = -1
q = (1..4).to_a
index_at_rotation(a, k, q)
#=> [9, 11, 1, nil]
I don't see any errors in your code, but I would like to suggest a more efficient way of making the calculation.
First observe that after q rotations the element at index i will at index (i+q) % n.
For example, suppose
n = 3
a = [1,2,3]
q = 5
Then after q rotations the array will be as follows.
arr = Array.new(3)
arr[(0+5) % 3] = a[0] #=> arr[2] = 1
arr[(1+5) % 3] = a[1] #=> arr[0] = 2
arr[(2+5) % 3] = a[2] #=> arr[1] = 3
arr #=> [2,3,1]
We therefore can write
def doit(n,a,q,queries)
n.times.with_object(Array.new(n)) do |i,arr|
arr[(i+q) % n] = a[i]
end.values_at(*queries)
end
doit(3,[1,2,3],5,[0,1,2])
#=> [2,3,1]
doit(3,[1,2,3],5,[2,1])
#=> [1, 3]
doit(3,[1,2,3],2,[0,1,2])
#=> [2, 3, 1]
p doit(3,[1,2,3],0,[0,1,2])
#=> [1,2,3]
doit(20,(0..19).to_a,25,(0..19).to_a.reverse)
#=> [14, 13, 12, 11, 10, 9, 8, 7, 6, 5,
# 4, 3, 2, 1, 0, 19, 18, 17, 16, 15]
Alternatively, we may observe that after q rotations the element at index j was initially at index (j-q) % n.
For the earlier example, after q rotations the array will be
[a[(0-5) % 3], a[(1-5) % 3], a[(2-5) % 3]]
#=> [a[1], a[2], a[0]]
#=> [2,3,1]
We therefore could instead write
def doit(n,a,q,queries)
n.times.map { |j| a[(j-q) % n] }.values_at(*queries)
end

better multiple array sort, based on first array

I'm working to update the SVG::Graph gem, and have made many improvements to my version, but have found a bottleneck with multiple array sorting.
There is a "sort_multiple" function built in, which keeps an array of arrays (all of equal size) sorted by the first array in the group.
The issue I have is that this sort works well on truly random data, and really badly on sorted, or almost sorted data:
def sort_multiple( arrys, lo=0, hi=arrys[0].length-1 )
if lo < hi
p = partition(arrys,lo,hi)
sort_multiple(arrys, lo, p-1)
sort_multiple(arrys, p+1, hi)
end
arrys
end
def partition( arrys, lo, hi )
p = arrys[0][lo]
l = lo
z = lo+1
while z <= hi
if arrys[0][z] < p
l += 1
arrys.each { |arry| arry[z], arry[l] = arry[l], arry[z] }
end
z += 1
end
arrys.each { |arry| arry[lo], arry[l] = arry[l], arry[lo] }
l
end
this routine appears to use a variant of the Lomuto partition scheme from wikipedia: https://en.wikipedia.org/wiki/Quicksort#Lomuto_partition_scheme
I have an array of 5000+ numbers, which is previously sorted, and this function adds about 1/2 second per chart.
I have modified the "sort_multiple" routine with the following:
def sort_multiple( arrys, lo=0, hi=arrys[0].length-1 )
first = arrys.first
return arrys if first == first.sort
if lo < hi
...
which has "fixed" the problem with sorted data, but I was wondering if there is any way to utilise the better sort functions built into ruby to get this sort to work much quicker. e.g. do you think I could utilise a Tsort to speed this up? https://ruby-doc.org/stdlib-2.6.1/libdoc/tsort/rdoc/TSort.html
looking at my benchmarking, the completely random first group appears to be very fast.
Current benchmarking:
def sort_multiple( arrys, lo=0, hi=arrys[0].length-1 )
if lo < hi
p = partition(arrys,lo,hi)
sort_multiple(arrys, lo, p-1)
sort_multiple(arrys, p+1, hi)
end
arrys
end
def partition( arrys, lo, hi )
p = arrys[0][lo]
l = lo
z = lo+1
while z <= hi
if arrys[0][z] < p
l += 1
arrys.each { |arry| arry[z], arry[l] = arry[l], arry[z] }
end
z += 1
end
arrys.each { |arry| arry[lo], arry[l] = arry[l], arry[lo] }
l
end
first = (1..5400).map { rand }
second = (1..5400).map { rand }
unsorted_arrys = [first.dup, second.dup, Array.new(5400), Array.new(5400), Array.new(5400)]
sorted_arrys = [first.sort, second.dup, Array.new(5400), Array.new(5400), Array.new(5400)]
require 'benchmark'
Benchmark.bmbm do |x|
x.report("unsorted") { sort_multiple( unsorted_arrys.map(&:dup) ) }
x.report("sorted") { sort_multiple( sorted_arrys.map(&:dup) ) }
end
results:
Rehearsal --------------------------------------------
unsorted 0.070699 0.000008 0.070707 ( 0.070710)
sorted 0.731734 0.000000 0.731734 ( 0.731742)
----------------------------------- total: 0.802441sec
user system total real
unsorted 0.051636 0.000000 0.051636 ( 0.051636)
sorted 0.715730 0.000000 0.715730 ( 0.715733)
#EDIT#
Final accepted solution:
def sort( *arrys )
new_arrys = arrys.transpose.sort_by(&:first).transpose
new_arrys.each_index { |k| arrys[k].replace(new_arrys[k]) }
end
I have an array of 5000+ numbers, which is previously sorted, and this function adds about 1/2 second per chart.
Unfortunately, algorithms implemented in Ruby can become quite slow. It's often much faster to delegate the work to the built-in methods that are implemented in C, even if it comes with an overhead.
To sort a nested array, you could transpose it, then sort_by its first element, and transpose again afterwards:
arrays.transpose.sort_by(&:first).transpose
It works like this:
arrays #=> [[3, 1, 2], [:c, :a, :b]]
.transpose #=> [[3, :c], [1, :a], [2, :b]]
.sort_by(&:first) #=> [[1, :a], [2, :b], [3, :c]]
.transpose #=> [[1, 2, 3], [:a, :b, :c]]
And although it creates several temporary arrays along the way, the result seems to be an order of magnitude faster than the "unsorted" variant:
unsorted 0.035297 0.000106 0.035403 ( 0.035458)
sorted 0.474134 0.003065 0.477199 ( 0.480667)
transpose 0.001572 0.000082 0.001654 ( 0.001655)
In the long run, you could try to implement your algorithm as a C extension.
I confess I don't fully understand the question and don't have the time to study the code at the link, but it seems that you have one sorted array that you are repeatedly mutating only slightly, and with each change you may mutate several other arrays, each a little or a lot. After each set of mutations you re-sort the first array and then rearrage each of the other arrays consistent with the changes in indices of elements in the first array.
If, for example, the first array were
arr = [2,4,6,8,10]
and the change to arr were to replace the element at index 1 (4) with 9 and the element at index 3 (8) with 3, arr would become [2,9,6,3,10], which, after re-sorting, would be [2,3,6,9,10]. We could do that as follows:
new_arr, indices = [2,9,6,3,10].each_with_index.sort.transpose
#=> [[2, 3, 6, 9, 10], [0, 3, 2, 1, 4]]
Therefore,
new_arr
#=> [2, 3, 6, 9, 10]
indices
#=> [0, 3, 2, 1, 4]
the intermediate calculation being
[2,9,6,3,10].each_with_index.sort
#=> [[2, 0], [3, 3], [6, 2], [9, 1], [10, 4]]
Considering that
new_array == [2,9,6,3,10].values_at(*indices)
#=> true
we see that each of the other arrays, after having been mutated, can be sorted to conform with the sorting of indices in the first array with the following method, which is quite fast.
def sort_like_first(a, indices)
a.values_at(*indices)
end
For example,
a = [5,4,3,1,2]
a.replace(sort_like_first a, indices)
a #=> [5, 1, 3, 4, 2]
a = %w|dog cat cow pig owl|
a.replace(sort_like_first a, indices)
a #=> ["dog", "pig", "cow", "cat", "owl"]
In fact, it's not necessary to sort each of the other arrays until they are required in the calculations.
I would now like to consider a special case, namely, when only a single element in the first array is to be changed.
Suppose (as before)
arr = [2,4,6,8,10]
and the element at index 3 (8) is to be replaced with 5, resulting in [2,4,6,5,10]. A fast sort can be done with the following method, which employs a binary search.
def new_indices(arr, replace_idx, replace_val)
new_loc = arr.bsearch_index { |n| n >= replace_val } || arr.size
indices = (0..arr.size-1).to_a
index_removed = indices.delete_at(replace_idx)
new_loc -= 1 if new_loc > replace_idx
indices.insert(new_loc, index_removed)
end
arr.bsearch_index { |n| n >= replace_val } returns nil if n >= replace_val #=> false for all n. It is for that reason I have tacked on || arr.size.
See Array#bsearch_index, Array#delete_at and Array#insert.
Let's try it. If
arr = [2,4,6,8,10]
replace_idx = 3
replace_val = 5
then
indices = new_indices(arr, replace_idx, replace_val)
#=> [0, 1, 3, 2, 4]
Only now can we replace the element of arr at index replace_idx.
arr[replace_idx] = replace_val
arr
#=> [2, 4, 6, 5, 10]
We see that the re-sorted array is as follows.
arr.values_at(*indices)
#=> [2, 4, 5, 6, 10]
The other arrays are sorted as before, using sort_like_first:
a = [5,4,3,1,2]
a.replace(sort_like_first(a, indices))
#=> [5, 4, 1, 3, 2]
a = %w|dog cat cow pig owl|
a.replace(sort_like_first(a, indices))
#=> ["dog", "cat", "pig", "cow", "owl"]
Here's a second example.
arr = [2,4,6,8,10]
replace_idx = 3
replace_val = 12
indices = new_indices(arr, replace_idx, replace_val)
#=> [0, 1, 2, 4, 3]
arr[replace_idx] = replace_val
arr
#=> [2, 4, 6, 12, 10]
The first array sorted is therefore
arr.values_at(*indices)
#=> [2, 4, 6, 10, 12]
The other arrays are sorted as follows.
a = [5,4,3,1,2]
a.replace(sort_like_first a, indices)
a #=> [5, 4, 3, 2, 1]
a = %w|dog cat cow pig owl|
a.replace(sort_like_first a, indices)
a #=> ["dog", "cat", "cow", "owl", "pig"]

How to add amounts from an array of arrays

I have an array of arrays like so...
a1 = [[9, -1811.4], [8, 959.86], [7, -385], [6, -1731.39], [5, 806.78], [4, 2191.65]]
I need to get the average of the 2nd items(the amounts) from the total array.
So add -1811.4,959.86,-385,-1731.39,806.78 divided by the count (6)
I have tried...
a1.inject{ |month, amount| amount }.to_f / a1.size
This is not right and I cant see what I need to do
a1.map(&:last).inject(:+) / a1.size.to_f
#=> 5.0833333333332575
Steps:
# 1. select last elements
a1.map(&:last)
#=> [-1811.4, 959.86, -385, -1731.39, 806.78, 2191.65]
# 2. sum them up
a1.map(&:last).inject(:+)
#=> 30.499999999999545
# 3. divide by the size of a1
a1.map(&:last).inject(:+) / a1.size.to_f
#5.0833333333332575
One pass through a1 is sufficient.
a1.reduce(0) { |tot, (_,b)| tot + b }/a1.size.to_f
#=> 5.0833333333332575
.to_f allows a1 to contain only integer values.
The steps:
tot = a1.reduce(0) { |tot, (_,b)| tot + b }
#=> 30.499999999999545
n = a1.size.to_f
#=> 6.0
tot/n
#=> 5.0833333333332575

Ruby - print result of inject sum on array

Part of the code below sums the elements of an array. How can I print the resulting sum of the array?
#!/usr/bin/ruby
a = [ 1, 2, 3, 4]
b = a.map { |x| x*x }
c = a.select { |x| x%2== 0 }
puts a.inject do | sum,x |
sum + x
end
puts a.inspect
puts b.inspect
puts c.inspect
The sum of a can be printed by wrapping the whole inject block in parenthesis, making the resulting output the argument of puts:
a = [1, 2, 3, 4]
puts (
a.inject do |sum, x|
sum + x
end
)
# => 10
The above can be cleaned up a bit by assigning the sum of the array to a more descriptive variable, and/or by using the shorter inject syntax for summing. Your code could then look something like this:
a = [1, 2, 3, 4]
b = a.map { |x| x * x }
c = a.select { |x| x % 2 == 0 }
sum_a = a.inject(:+)
puts a.inspect
puts b.inspect
puts c.inspect
puts sum_a
# => [1, 2, 3, 4]
# => [1, 4, 9, 16]
# => [2, 4]
# => 10
Hope it helps!
Update:
As Cary pointed out in the comment below, additional improvements include condensing the c variable assignment to use a.select(&:even?) for filtering out integers divisible by 2, and using p [variable] instead of puts [variable].inspect.

Resources